Machine Learning for Ancient Languages: A Survey

Thea Sommerschield, Yannis Assael, John Pavlopoulos, Vanessa Stefanak, Andrew Senior, Chris Dyer, John Bodel, Jonathan Prag, Ion Androutsopoulos, Nando de Freitas


Abstract
Ancient languages preserve the cultures and histories of the past. However, their study is fraught with difficulties, and experts must tackle a range of challenging text-based tasks, from deciphering lost languages to restoring damaged inscriptions, to determining the authorship of works of literature. Technological aids have long supported the study of ancient texts, but in recent years advances in artificial intelligence and machine learning have enabled analyses on a scale and in a detail that are reshaping the field of humanities, similarly to how microscopes and telescopes have contributed to the realm of science. This article aims to provide a comprehensive survey of published research using machine learning for the study of ancient texts written in any language, script, and medium, spanning over three and a half millennia of civilizations around the ancient world. To analyze the relevant literature, we introduce a taxonomy of tasks inspired by the steps involved in the study of ancient documents: digitization, restoration, attribution, linguistic analysis, textual criticism, translation, and decipherment. This work offers three major contributions: first, mapping the interdisciplinary field carved out by the synergy between the humanities and machine learning; second, highlighting how active collaboration between specialists from both fields is key to producing impactful and compelling scholarship; third, highlighting promising directions for future work in this field. Thus, this work promotes and supports the continued collaborative impetus between the humanities and machine learning.
Anthology ID:
2023.cl-3.5
Volume:
Computational Linguistics, Volume 49, Issue 3 - September 2023
Month:
September
Year:
2023
Address:
Cambridge, MA
Venue:
CL
SIG:
Publisher:
MIT Press
Note:
Pages:
703–747
Language:
URL:
https://aclanthology.org/2023.cl-3.5
DOI:
10.1162/coli_a_00481
Bibkey:
Cite (ACL):
Thea Sommerschield, Yannis Assael, John Pavlopoulos, Vanessa Stefanak, Andrew Senior, Chris Dyer, John Bodel, Jonathan Prag, Ion Androutsopoulos, and Nando de Freitas. 2023. Machine Learning for Ancient Languages: A Survey. Computational Linguistics:703–747.
Cite (Informal):
Machine Learning for Ancient Languages: A Survey (Sommerschield et al., CL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.cl-3.5.pdf