Avalanche at DravidianLangTech: Abusive Comment Detection in Code Mixed Data Using Machine Learning Techniques with Under Sampling

Rajalakshmi Sivanaiah, Rajasekar S, Srilakshmisai K, Angel Deborah S, Mirnalinee ThankaNadar


Abstract
In recent years, the growth of online platforms and social media has given rise to a concerning increase in the presence of abusive content. This poses significant challenges for maintaining a safe and inclusive digital environment. In order to resolve this issue, this paper experiments an approach for detecting abusive comments. We are using a combination of pipelining and vectorization techniques, along with algorithms such as the stochastic gradient descent (SGD) classifier and support vector machine (SVM) classifier. We conducted experiments on an Tamil-English code mixed dataset to evaluate the performance of this approach. Using the stochastic gradient descent classifier algorithm, we achieved a weighted F1 score of 0.76 and a macro score of 0.45 for development dataset. Furthermore, by using the support vector machine classifier algorithm, we obtained a weighted F1 score of 0.78 and a macro score of 0.42 for development dataset. With the test dataset, SGD approach secured 5th rank with 0.44 macro F1 score, while SVM scored 8th rank with 0.35 macro F1 score in the shared task. The top rank team secured 0.55 macro F1 score.
Anthology ID:
2023.dravidianlangtech-1.22
Volume:
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Bharathi R. Chakravarthi, Ruba Priyadharshini, Anand Kumar M, Sajeetha Thavareesan, Elizabeth Sherly
Venues:
DravidianLangTech | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
166–170
Language:
URL:
https://aclanthology.org/2023.dravidianlangtech-1.22
DOI:
Bibkey:
Cite (ACL):
Rajalakshmi Sivanaiah, Rajasekar S, Srilakshmisai K, Angel Deborah S, and Mirnalinee ThankaNadar. 2023. Avalanche at DravidianLangTech: Abusive Comment Detection in Code Mixed Data Using Machine Learning Techniques with Under Sampling. In Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages, pages 166–170, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Avalanche at DravidianLangTech: Abusive Comment Detection in Code Mixed Data Using Machine Learning Techniques with Under Sampling (Sivanaiah et al., DravidianLangTech-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.dravidianlangtech-1.22.pdf