Neural Unsupervised Reconstruction of Protolanguage Word Forms

Andre He, Nicholas Tomlin, Dan Klein


Abstract
We present a state-of-the-art neural approach to the unsupervised reconstruction of ancient word forms. Previous work in this domain used expectation-maximization to predict simple phonological changes between ancient word forms and their cognates in modern languages. We extend this work with neural models that can capture more complicated phonological and morphological changes. At the same time, we preserve the inductive biases from classical methods by building monotonic alignment constraints into the model and deliberately underfitting during the maximization step. We evaluate our performance on the task of reconstructing Latin from a dataset of cognates across five Romance languages, achieving a notable reduction in edit distance from the target word forms compared to previous methods.
Anthology ID:
2023.acl-long.91
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1636–1649
Language:
URL:
https://aclanthology.org/2023.acl-long.91
DOI:
10.18653/v1/2023.acl-long.91
Bibkey:
Cite (ACL):
Andre He, Nicholas Tomlin, and Dan Klein. 2023. Neural Unsupervised Reconstruction of Protolanguage Word Forms. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1636–1649, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Neural Unsupervised Reconstruction of Protolanguage Word Forms (He et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-long.91.pdf
Video:
 https://aclanthology.org/2023.acl-long.91.mp4