Towards Multilingual Interlinear Morphological Glossing

Shu Okabe, François Yvon


Abstract
Interlinear Morphological Glosses are annotations produced in the context of language documentation. Their goal is to identify morphs occurring in an L1 sentence and to explicit their function and meaning, with the further support of an associated translation in L2. We study here the task of automatic glossing, aiming to provide linguists with adequate tools to facilitate this process. Our formalisation of glossing uses a latent variable Conditional Random Field (CRF), which labels the L1 morphs while simultaneously aligning them to L2 words. In experiments with several under-resourced languages, we show that this approach is both effective and data-efficient and mitigates the problem of annotating unknown morphs. We also discuss various design choices regarding the alignment process and the selection of features. We finally demonstrate that it can benefit from multilingual (pre-)training, achieving results which outperform very strong baselines.
Anthology ID:
2023.findings-emnlp.396
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5958–5971
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.396
DOI:
10.18653/v1/2023.findings-emnlp.396
Bibkey:
Cite (ACL):
Shu Okabe and François Yvon. 2023. Towards Multilingual Interlinear Morphological Glossing. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5958–5971, Singapore. Association for Computational Linguistics.
Cite (Informal):
Towards Multilingual Interlinear Morphological Glossing (Okabe & Yvon, Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.396.pdf