Detecting Erroneously Recognized Handwritten Byzantine Text

John Pavlopoulos, Vasiliki Kougia, Paraskevi Platanou, Holger Essler


Abstract
Handwritten text recognition (HTR) yields textual output that comprises errors, which are considerably more compared to that of recognised printed (OCRed) text. Post-correcting methods can eliminate such errors but may also introduce errors. In this study, we investigate the issues arising from this reality in Byzantine Greek. We investigate the properties of the texts that lead post-correction systems to this adversarial behaviour and we experiment with text classification systems that learn to detect incorrect recognition output. A large masked language model, pre-trained in modern and fine-tuned in Byzantine Greek, achieves an Average Precision score of 95%. The score improves to 97% when using a model that is pre-trained in modern and then in ancient Greek, the two language forms Byzantine Greek combines elements from. A century-based analysis shows that the advantage of the classifier that is further-pre-trained in ancient Greek concerns texts of older centuries. The application of this classifier before a neural post-corrector on HTRed text reduced significantly the post-correction mistakes.
Anthology ID:
2023.findings-emnlp.524
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7818–7828
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.524
DOI:
10.18653/v1/2023.findings-emnlp.524
Bibkey:
Cite (ACL):
John Pavlopoulos, Vasiliki Kougia, Paraskevi Platanou, and Holger Essler. 2023. Detecting Erroneously Recognized Handwritten Byzantine Text. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7818–7828, Singapore. Association for Computational Linguistics.
Cite (Informal):
Detecting Erroneously Recognized Handwritten Byzantine Text (Pavlopoulos et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.524.pdf