Tadesse Destaw Belay


2023

pdf bib
Exploring Amharic Hate Speech Data Collection and Classification Approaches
Abinew Ali Ayele | Seid Muhie Yimam | Tadesse Destaw Belay | Tesfa Asfaw | Chris Biemann
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

In this paper, we present a study of efficient data selection and annotation strategies for Amharic hate speech. We also build various classification models and investigate the challenges of hate speech data selection, annotation, and classification for the Amharic language. From a total of over 18 million tweets in our Twitter corpus, 15.1k tweets are annotated by two independent native speakers, and a Cohen’s kappa score of 0.48 is achieved. A third annotator, a curator, is also employed to decide on the final gold labels. We employ both classical machine learning and deep learning approaches, which include fine-tuning AmFLAIR and AmRoBERTa contextual embedding models. Among all the models, AmFLAIR achieves the best performance with an F1-score of 72%. We publicly release the annotation guidelines, keywords/lexicon entries, datasets, models, and associated scripts with a permissive license.

pdf bib
Natural Language Processing in Ethiopian Languages: Current State, Challenges, and Opportunities
Atnafu Lambebo Tonja | Tadesse Destaw Belay | Israel Abebe Azime | Abinew Ali Ayele | Moges Ahmed Mehamed | Olga Kolesnikova | Seid Muhie Yimam
Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023)

This survey delves into the current state of natural language processing (NLP) for four Ethiopian languages: Amharic, Afaan Oromo, Tigrinya, and Wolaytta. Through this paper, we identify key challenges and opportunities for NLP research in Ethiopia.Furthermore, we provide a centralized repository on GitHub that contains publicly available resources for various NLP tasks in these languages. This repository can be updated periodically with contributions from other researchers. Our objective is to disseminate information to NLP researchers interested in Ethiopian languages and encourage future research in this domain.