Four Approaches to Low-Resource Multilingual NMT: The Helsinki Submission to the AmericasNLP 2023 Shared Task

Ona De Gibert, Raúl Vázquez, Mikko Aulamo, Yves Scherrer, Sami Virpioja, Jörg Tiedemann


Abstract
The Helsinki-NLP team participated in the AmericasNLP 2023 Shared Task with 6 submissions for all 11 language pairs arising from 4 different multilingual systems. We provide a detailed look at the work that went into collecting and preprocessing the data that led to our submissions. We explore various setups for multilingual Neural Machine Translation (NMT), namely knowledge distillation and transfer learning, multilingual NMT including a high-resource language (English), language-specific fine-tuning, and multilingual NMT exclusively using low-resource data. Our multilingual Model B ranks first in 4 out of the 11 language pairs.
Anthology ID:
2023.americasnlp-1.20
Volume:
Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Manuel Mager, Abteen Ebrahimi, Arturo Oncevay, Enora Rice, Shruti Rijhwani, Alexis Palmer, Katharina Kann
Venue:
AmericasNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
177–191
Language:
URL:
https://aclanthology.org/2023.americasnlp-1.20
DOI:
10.18653/v1/2023.americasnlp-1.20
Bibkey:
Cite (ACL):
Ona De Gibert, Raúl Vázquez, Mikko Aulamo, Yves Scherrer, Sami Virpioja, and Jörg Tiedemann. 2023. Four Approaches to Low-Resource Multilingual NMT: The Helsinki Submission to the AmericasNLP 2023 Shared Task. In Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP), pages 177–191, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Four Approaches to Low-Resource Multilingual NMT: The Helsinki Submission to the AmericasNLP 2023 Shared Task (De Gibert et al., AmericasNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.americasnlp-1.20.pdf