Fine-tuning MBART-50 with French and Farsi data to improve the translation of Farsi dislocations into English and French

Behnoosh Namdarzadeh, Sadaf Mohseni, Lichao Zhu, Guillaume Wisniewski, Nicolas Ballier


Abstract
In this paper, we discuss the improvements brought by the fine-tuning of mBART50 for the translation of a specific Farsi dataset of dislocations. Given our BLEU scores, our evaluation is mostly qualitative: we assess the improvements of our fine-tuning in the translations into French of our test dataset of Farsi. We describe the fine-tuning procedure and discuss the quality of the results in the translations from Farsi. We assess the sentences in the French translations that contain English tokens and for the English translations, we examine the ability of the fine- tuned system to translate Farsi dislocations into English without replicating the dislocated item as a double subject. We scrutinized the Farsi training data used to train for mBART50 (Tang et al., 2021). We fine-tuned mBART50 with samples from an in-house French-Farsi aligned translation of a short story. In spite of the scarcity of available resources, we found that fine- tuning with aligned French-Farsi data dramatically improved the grammatical well-formedness of the predictions for French, even if serious semantic issues remained. We replicated the experiment with the English translation of the same Farsi short story for a Farsi-English fine-tuning and found out that similar semantic inadequacies cropped up, and that some translations were worse than our mBART50 baseline. We showcased the fine-tuning of mBART50 with supplementary data and discussed the asymmetry of the situation, adding little data in the fine-tuning is sufficient to improve morpho-syntax for one language pair but seems to degrade translation to English.
Anthology ID:
2023.mtsummit-users.14
Volume:
Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track
Month:
September
Year:
2023
Address:
Macau SAR, China
Editors:
Masaru Yamada, Felix do Carmo
Venue:
MTSummit
SIG:
Publisher:
Asia-Pacific Association for Machine Translation
Note:
Pages:
152–161
Language:
URL:
https://aclanthology.org/2023.mtsummit-users.14
DOI:
Bibkey:
Cite (ACL):
Behnoosh Namdarzadeh, Sadaf Mohseni, Lichao Zhu, Guillaume Wisniewski, and Nicolas Ballier. 2023. Fine-tuning MBART-50 with French and Farsi data to improve the translation of Farsi dislocations into English and French. In Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track, pages 152–161, Macau SAR, China. Asia-Pacific Association for Machine Translation.
Cite (Informal):
Fine-tuning MBART-50 with French and Farsi data to improve the translation of Farsi dislocations into English and French (Namdarzadeh et al., MTSummit 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.mtsummit-users.14.pdf