Low-Resource Counterspeech Generation for Indic Languages: The Case of Bengali and Hindi

Mithun Das; Saurabh Pandey; Shivansh Sethi; Punyajoy Saha; Animesh Mukherjee

Low-Resource Counterspeech Generation for Indic Languages: The Case of Bengali and Hindi

Mithun Das, Saurabh Pandey, Shivansh Sethi, Punyajoy Saha, Animesh Mukherjee

Abstract

With the rise of online abuse, the NLP community has begun investigating the use of neural architectures to generate counterspeech that can “counter” the vicious tone of such abusive speech and dilute/ameliorate their rippling effect over the social network. However, most of the efforts so far have been primarily focused on English. To bridge the gap for low-resource languages such as Bengali and Hindi, we create a benchmark dataset of 5,062 abusive speech/counterspeech pairs, of which 2,460 pairs are in Bengali, and 2,602 pairs are in Hindi. We implement several baseline models considering various interlingual transfer mechanisms with different configurations to generate suitable counterspeech to set up an effective benchmark. We observe that the monolingual setup yields the best performance. Further, using synthetic transfer, language models can generate counterspeech to some extent; specifically, we notice that transferability is better when languages belong to the same language family.

Anthology ID:: 2024.findings-eacl.111
Volume:: Findings of the Association for Computational Linguistics: EACL 2024
Month:: March
Year:: 2024
Address:: St. Julian’s, Malta
Editors:: Yvette Graham, Matthew Purver
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1601–1614
Language:
URL:: https://aclanthology.org/2024.findings-eacl.111
DOI:
Bibkey:
Cite (ACL):: Mithun Das, Saurabh Pandey, Shivansh Sethi, Punyajoy Saha, and Animesh Mukherjee. 2024. Low-Resource Counterspeech Generation for Indic Languages: The Case of Bengali and Hindi. In Findings of the Association for Computational Linguistics: EACL 2024, pages 1601–1614, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):: Low-Resource Counterspeech Generation for Indic Languages: The Case of Bengali and Hindi (Das et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-eacl.111.pdf
Note:: 2024.findings-eacl.111.note.zip

PDF Cite Search Note