Word Level Language Identification in Code-mixed Kannada-English Texts using Deep Learning Approach

Mesay Gemeda Yigezu; Atnafu Lambebo Tonja; Olga Kolesnikova; Moein Shahiki Tash; Grigori Sidorov; Alexander Gelbukh

Word Level Language Identification in Code-mixed Kannada-English Texts using Deep Learning Approach

Mesay Gemeda Yigezu, Atnafu Lambebo Tonja, Olga Kolesnikova, Moein Shahiki Tash, Grigori Sidorov, Alexander Gelbukh

Abstract

The goal of code-mixed language identification (LID) is to determine which language is spoken or written in a given segment of a speech, word, sentence, or document. Our task is to identify English, Kannada, and mixed language from the provided data. To train a model we used the CoLI-Kenglish dataset, which contains English, Kannada, and mixed-language words. In our work, we conducted several experiments in order to obtain the best performing model. Then, we implemented the best model by using Bidirectional Long Short Term Memory (Bi-LSTM), which outperformed the other trained models with an F1-score of 0.61%.

Anthology ID:: 2022.icon-wlli.6
Volume:: Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts
Month:: December
Year:: 2022
Address:: IIIT Delhi, New Delhi, India
Editors:: Bharathi Raja Chakravarthi, Abirami Murugappan, Dhivya Chinnappa, Adeep Hane, Prasanna Kumar Kumeresan, Rahul Ponnusamy
Venue:: ICON
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 29–33
Language:
URL:: https://aclanthology.org/2022.icon-wlli.6
DOI:
Bibkey:
Cite (ACL):: Mesay Gemeda Yigezu, Atnafu Lambebo Tonja, Olga Kolesnikova, Moein Shahiki Tash, Grigori Sidorov, and Alexander Gelbukh. 2022. Word Level Language Identification in Code-mixed Kannada-English Texts using Deep Learning Approach. In Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts, pages 29–33, IIIT Delhi, New Delhi, India. Association for Computational Linguistics.
Cite (Informal):: Word Level Language Identification in Code-mixed Kannada-English Texts using Deep Learning Approach (Gemeda Yigezu et al., ICON 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.icon-wlli.6.pdf

PDF Cite Search