Semantically Informed Data Augmentation for Unscoped Episodic Logical Forms

Mandar Juvekar, Gene Kim, Lenhart Schubert


Abstract
Unscoped Logical Form (ULF) of Episodic Logic is a meaning representation format that captures the overall semantic type structure of natural language while leaving certain finer details, such as word sense and quantifier scope, underspecified for ease of parsing and annotation. While a learned parser exists to convert English to ULF, its performance is severely limited by the lack of a large dataset to train the system. We present a ULF dataset augmentation method that samples type-coherent ULF expressions using the ULF semantic type system and filters out samples corresponding to implausible English sentences using a pretrained language model. Our data augmentation method is configurable with parameters that trade off between plausibility of samples with sample novelty and augmentation size. We find that the best configuration of this augmentation method substantially improves parser performance beyond using the existing unaugmented dataset.
Anthology ID:
2023.iwcs-1.14
Volume:
Proceedings of the 15th International Conference on Computational Semantics
Month:
June
Year:
2023
Address:
Nancy, France
Editors:
Maxime Amblard, Ellen Breitholtz
Venue:
IWCS
SIG:
SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
116–133
Language:
URL:
https://aclanthology.org/2023.iwcs-1.14
DOI:
Bibkey:
Cite (ACL):
Mandar Juvekar, Gene Kim, and Lenhart Schubert. 2023. Semantically Informed Data Augmentation for Unscoped Episodic Logical Forms. In Proceedings of the 15th International Conference on Computational Semantics, pages 116–133, Nancy, France. Association for Computational Linguistics.
Cite (Informal):
Semantically Informed Data Augmentation for Unscoped Episodic Logical Forms (Juvekar et al., IWCS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.iwcs-1.14.pdf