byteLLM@LT-EDI-2024: Homophobia/Transphobia Detection in Social Media Comments - Custom Subword Tokenization with Subword2Vec and BiLSTM

Durga Manukonda, Rohith Kodali


Abstract
This research focuses on Homophobia and Transphobia Detection in Dravidian languages, specifically Telugu, Kannada, Tamil, and Malayalam. Leveraging the Homophobia/ Transphobia Detection dataset, we propose an innovative approach employing a custom-designed tokenizer with a Bidirectional Long Short-Term Memory (BiLSTM) architecture. Our distinctive contribution lies in a tokenizer that reduces model sizes to below 7MB, improving efficiency and addressing real-time deployment challenges. The BiLSTM implementation demonstrates significant enhancements in hate speech detection accuracy, effectively capturing linguistic nuances. Low-size models efficiently alleviate inference challenges, ensuring swift real-time detection and practical deployment. This work pioneers a framework for hate speech detection, providing insights into model size, inference speed, and real-time deployment challenges in combatting online hate speech within Dravidian languages.
Anthology ID:
2024.ltedi-1.16
Volume:
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion
Month:
March
Year:
2024
Address:
St. Julian's, Malta
Editors:
Bharathi Raja Chakravarthi, Bharathi B, Paul Buitelaar, Thenmozhi Durairaj, György Kovács, Miguel Ángel García Cumbreras
Venues:
LTEDI | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
157–163
Language:
URL:
https://aclanthology.org/2024.ltedi-1.16
DOI:
Bibkey:
Cite (ACL):
Durga Manukonda and Rohith Kodali. 2024. byteLLM@LT-EDI-2024: Homophobia/Transphobia Detection in Social Media Comments - Custom Subword Tokenization with Subword2Vec and BiLSTM. In Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion, pages 157–163, St. Julian's, Malta. Association for Computational Linguistics.
Cite (Informal):
byteLLM@LT-EDI-2024: Homophobia/Transphobia Detection in Social Media Comments - Custom Subword Tokenization with Subword2Vec and BiLSTM (Manukonda & Kodali, LTEDI-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.ltedi-1.16.pdf