Transformer-based Context Aware Morphological Analyzer for Telugu

Priyanka Dasari; Abhijith Chelpuri; Nagaraju Vuppala; Mounika Marreddy; Parameshwari Krishnamurthy; Radhika Mamidi

Transformer-based Context Aware Morphological Analyzer for Telugu

Priyanka Dasari, Abhijith Chelpuri, Nagaraju Vuppala, Mounika Marreddy, Parameshwari Krishnamurthy, Radhika Mamidi

Abstract

This paper addresses the challenges faced by Indian languages in leveraging deep learning for natural language processing (NLP) due to limited resources, annotated datasets, and Transformer-based architectures. We specifically focus on Telugu and aim to construct a Telugu morph analyzer dataset comprising 10,000 sentences. Furthermore, we assess the performance of established multi-lingual Transformer models (m-Bert, XLM-R, IndicBERT) and mono-lingual Transformer models trained from scratch on an extensive Telugu corpus comprising 80,15,588 sentences (BERT-Te). Our findings demonstrate the efficacy of Transformer-based representations pretrained on Telugu data in improving the performance of the Telugu morph analyzer, surpassing existing multi-lingual approaches. This highlights the necessity of developing dedicated corpora, annotated datasets, and machine learning models in a mono-lingual setting. We present benchmark results for the Telugu morph analyzer achieved through simple fine-tuning on our dataset.

Anthology ID:: 2023.dravidianlangtech-1.4
Volume:: Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
Month:: September
Year:: 2023
Address:: Varna, Bulgaria
Editors:: Bharathi R. Chakravarthi, Ruba Priyadharshini, Anand Kumar M, Sajeetha Thavareesan, Elizabeth Sherly
Venues:: DravidianLangTech | WS
SIG:
Publisher:: INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:: 25–32
Language:
URL:: https://aclanthology.org/2023.dravidianlangtech-1.4
DOI:
Bibkey:
Cite (ACL):: Priyanka Dasari, Abhijith Chelpuri, Nagaraju Vuppala, Mounika Marreddy, Parameshwari Krishnamurthy, and Radhika Mamidi. 2023. Transformer-based Context Aware Morphological Analyzer for Telugu. In Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages, pages 25–32, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):: Transformer-based Context Aware Morphological Analyzer for Telugu (Dasari et al., DravidianLangTech-WS 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.dravidianlangtech-1.4.pdf

PDF Cite Search