HW-TSC’s Participation in the WMT 2023 Automatic Post Editing Shared Task

Jiawei Yu, Min Zhang, Zhao Yanqing, Xiaofeng Zhao, Yuang Li, Su Chang, Yinglu Li, Ma Miaomiao, Shimin Tao, Hao Yang


Abstract
The paper presents the submission by HW-TSC in the WMT 2023 Automatic Post Editing (APE) shared task for the English-Marathi (En-Mr) language pair. Our method encompasses several key steps. First, we pre-train an APE model by utilizing synthetic APE data provided by the official task organizers. Then, we fine-tune the model by employing real APE data. For data augmentation, we incorporate candidate translations obtained from an external Machine Translation (MT) system. Furthermore, we integrate the En-Mr parallel corpus from the Flores-200 dataset into our training data. To address the overfitting issue, we employ R-Drop during the training phase. Given that APE systems tend to exhibit a tendency of ‘over-correction’, we employ a sentence-level Quality Estimation (QE) system to select the final output, deciding between the original translation and the corresponding output generated by the APE model. Our experiments demonstrate that pre-trained APE models are effective when being fine-tuned with the APE corpus of a limited size, and the performance can be further improved with external MT augmentation. Our approach improves the TER and BLEU scores on the development set by -2.42 and +3.76 points, respectively.
Anthology ID:
2023.wmt-1.85
Volume:
Proceedings of the Eighth Conference on Machine Translation
Month:
December
Year:
2023
Address:
Singapore
Editors:
Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
926–930
Language:
URL:
https://aclanthology.org/2023.wmt-1.85
DOI:
10.18653/v1/2023.wmt-1.85
Bibkey:
Cite (ACL):
Jiawei Yu, Min Zhang, Zhao Yanqing, Xiaofeng Zhao, Yuang Li, Su Chang, Yinglu Li, Ma Miaomiao, Shimin Tao, and Hao Yang. 2023. HW-TSC’s Participation in the WMT 2023 Automatic Post Editing Shared Task. In Proceedings of the Eighth Conference on Machine Translation, pages 926–930, Singapore. Association for Computational Linguistics.
Cite (Informal):
HW-TSC’s Participation in the WMT 2023 Automatic Post Editing Shared Task (Yu et al., WMT 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.wmt-1.85.pdf