DetectiveReDASers at HSD-2Lang 2024: A New Pooling Strategy with Cross-lingual Augmentation and Ensembling for Hate Speech Detection in Low-resource Languages

Fatima Zahra Qachfar, Bryan Tuck, Rakesh Verma


Abstract
This paper addresses hate speech detection in Turkish and Arabic tweets, contributing to the HSD-2Lang Shared Task. We propose a specialized pooling strategy within a soft-voting ensemble framework to improve classification in Turkish and Arabic language models. Our approach also includes expanding the training sets through cross-lingual translation, introducing a broader spectrum of hate speech examples. Our method attains F1-Macro scores of 0.6964 for Turkish (Subtask A) and 0.7123 for Arabic (Subtask B). While achieving these results, we also consider the computational overhead, striking a balance between the effectiveness of our unique pooling strategy, data augmentation, and soft-voting ensemble. This approach advances the practical application of language models in low-resource languages for hate speech detection.
Anthology ID:
2024.case-1.28
Volume:
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)
Month:
March
Year:
2024
Address:
St. Julians, Malta
Editors:
Ali Hürriyetoğlu, Hristo Tanev, Surendrabikram Thapa, Gökçe Uludoğan
Venues:
CASE | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
199–204
Language:
URL:
https://aclanthology.org/2024.case-1.28
DOI:
Bibkey:
Cite (ACL):
Fatima Zahra Qachfar, Bryan Tuck, and Rakesh Verma. 2024. DetectiveReDASers at HSD-2Lang 2024: A New Pooling Strategy with Cross-lingual Augmentation and Ensembling for Hate Speech Detection in Low-resource Languages. In Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024), pages 199–204, St. Julians, Malta. Association for Computational Linguistics.
Cite (Informal):
DetectiveReDASers at HSD-2Lang 2024: A New Pooling Strategy with Cross-lingual Augmentation and Ensembling for Hate Speech Detection in Low-resource Languages (Qachfar et al., CASE-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.case-1.28.pdf
Supplementary material:
 2024.case-1.28.SupplementaryMaterial.txt