Silver Data for Coreference Resolution in Ukrainian: Translation, Alignment, and Projection

Pavlo Kuchmiichuk


Abstract
Low-resource languages continue to present challenges for current NLP methods, and multilingual NLP is gaining attention in the research community. One of the main issues is the lack of sufficient high-quality annotated data for low-resource languages. In this paper, we show how labeled data for high-resource languages such as English can be used in low-resource NLP. We present two silver datasets for coreference resolution in Ukrainian, adapted from existing English data by manual translation and machine translation in combination with automatic alignment and annotation projection. The code is made publicly available.
Anthology ID:
2023.unlp-1.8
Volume:
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editor:
Mariana Romanyshyn
Venue:
UNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
62–72
Language:
URL:
https://aclanthology.org/2023.unlp-1.8
DOI:
10.18653/v1/2023.unlp-1.8
Bibkey:
Cite (ACL):
Pavlo Kuchmiichuk. 2023. Silver Data for Coreference Resolution in Ukrainian: Translation, Alignment, and Projection. In Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP), pages 62–72, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Silver Data for Coreference Resolution in Ukrainian: Translation, Alignment, and Projection (Kuchmiichuk, UNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.unlp-1.8.pdf
Video:
 https://aclanthology.org/2023.unlp-1.8.mp4