A Sanskrit grammar-based approach to identify and address gaps in Google Translate’s Sanskrit-English zero-shot NMT

Amit Rao, Kanchi Gopinath


Abstract
In this work, we test the working of Google Translate’s recently introduced Sanskrit-English translation system using a relatively small set of probe test cases designed to focus on those areas that we expect, based on a knowledge of Sanskrit and English grammar, to pose a challenge for translation between Sanskrit and English. We summarize the findings that point to significant gaps in the current Zero-Shot Neural Multilingual Translation (Zero-Shot NMT) approach to Sanskrit-English translation. We then suggest an approach based on Sanskrit grammar to create a differential parallel corpus as a corrective training data to address such gaps. This approach should also generalize to other pairs of languages that have low availability of learning resources, but a good grammar theory.
Anthology ID:
2023.clasp-1.16
Original:
2023.clasp-1.16v1
Version 2:
2023.clasp-1.16v2
Volume:
Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD)
Month:
September
Year:
2023
Address:
Gothenburg, Sweden
Editors:
Ellen Breitholtz, Shalom Lappin, Sharid Loaiciga, Nikolai Ilinykh, Simon Dobnik
Venue:
CLASP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
141–166
Language:
URL:
https://aclanthology.org/2023.clasp-1.16
DOI:
Bibkey:
Cite (ACL):
Amit Rao and Kanchi Gopinath. 2023. A Sanskrit grammar-based approach to identify and address gaps in Google Translate’s Sanskrit-English zero-shot NMT. In Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD), pages 141–166, Gothenburg, Sweden. Association for Computational Linguistics.
Cite (Informal):
A Sanskrit grammar-based approach to identify and address gaps in Google Translate’s Sanskrit-English zero-shot NMT (Rao & Gopinath, CLASP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.clasp-1.16.pdf