Translated Benchmarks Can Be Misleading: the Case of Estonian Question Answering

Hele-Andra Kuulmets, Mark Fishel


Abstract
Translated test datasets are a popular and cheaper alternative to native test datasets. However, one of the properties of translated data is the existence of cultural knowledge unfamiliar to the target language speakers. This can make translated test datasets differ significantly from native target datasets. As a result, we might inaccurately estimate the performance of the models in the target language. In this paper, we use both native and translated Estonian QA datasets to study this topic more closely. We discover that relying on the translated test dataset results in an overestimation of the model’s performance on native Estonian data.
Anthology ID:
2023.nodalida-1.71
Volume:
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May
Year:
2023
Address:
Tórshavn, Faroe Islands
Editors:
Tanel Alumäe, Mark Fishel
Venue:
NoDaLiDa
SIG:
Publisher:
University of Tartu Library
Note:
Pages:
710–716
Language:
URL:
https://aclanthology.org/2023.nodalida-1.71
DOI:
Bibkey:
Cite (ACL):
Hele-Andra Kuulmets and Mark Fishel. 2023. Translated Benchmarks Can Be Misleading: the Case of Estonian Question Answering. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 710–716, Tórshavn, Faroe Islands. University of Tartu Library.
Cite (Informal):
Translated Benchmarks Can Be Misleading: the Case of Estonian Question Answering (Kuulmets & Fishel, NoDaLiDa 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.nodalida-1.71.pdf