CLTL@Multimodal Hate Speech Event Detection 2024: The Winning Approach to Detecting Multimodal Hate Speech and Its Targets

Yeshan Wang, Ilia Markov


Abstract
In the context of the proliferation of multimodal hate speech related to the Russia-Ukraine conflict, we introduce a unified multimodal fusion system for detecting hate speech and its targets in text-embedded images. Our approach leverages the Twitter-based RoBERTa and Swin Transformer V2 models to encode textual and visual modalities, and employs the Multilayer Perceptron (MLP) fusion mechanism for classification. Our system achieved macro F1 scores of 87.27% for hate speech detection and 80.05% for hate speech target detection in the Multimodal Hate Speech Event Detection Challenge 2024, securing the 1st rank in both subtasks. We open-source the trained models at https://huggingface.co/Yestin-Wang
Anthology ID:
2024.case-1.9
Volume:
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)
Month:
March
Year:
2024
Address:
St. Julians, Malta
Editors:
Ali Hürriyetoğlu, Hristo Tanev, Surendrabikram Thapa, Gökçe Uludoğan
Venues:
CASE | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
73–78
Language:
URL:
https://aclanthology.org/2024.case-1.9
DOI:
Bibkey:
Cite (ACL):
Yeshan Wang and Ilia Markov. 2024. CLTL@Multimodal Hate Speech Event Detection 2024: The Winning Approach to Detecting Multimodal Hate Speech and Its Targets. In Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024), pages 73–78, St. Julians, Malta. Association for Computational Linguistics.
Cite (Informal):
CLTL@Multimodal Hate Speech Event Detection 2024: The Winning Approach to Detecting Multimodal Hate Speech and Its Targets (Wang & Markov, CASE-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.case-1.9.pdf
Supplementary material:
 2024.case-1.9.SupplementaryMaterial.txt