Unveiling Identity Biases in Toxicity Detection : A Game-Focused Dataset and Reactivity Analysis Approach

Josiane Van Dorpe, Zachary Yang, Nicolas Grenon-Godbout, Grégoire Winterstein


Abstract
Identity biases arise commonly from annotated datasets, can be propagated in language models and can cause further harm to marginal groups. Existing bias benchmarking datasets are mainly focused on gender or racial biases and are made to pinpoint which class the model is biased towards. They also are not designed for the gaming industry, a concern for models built for toxicity detection in videogames’ chat. We propose a dataset and a method to highlight oversensitive terms using reactivity analysis and the model’s performance. We test our dataset against ToxBuster, a language model developed by Ubisoft fine-tuned for toxicity detection on multiplayer videogame’s written chat, and Perspective API. We find that these toxicity models often automatically tag terms related to a community’s identity as toxic, which prevents members of already marginalized groups to make their presence known or have a mature / normal conversation. Through this process, we have generated an interesting list of terms that trigger the models to varying degrees, along with insights on establishing a baseline through human annotations.
Anthology ID:
2023.emnlp-industry.26
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
December
Year:
2023
Address:
Singapore
Editors:
Mingxuan Wang, Imed Zitouni
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
263–274
Language:
URL:
https://aclanthology.org/2023.emnlp-industry.26
DOI:
10.18653/v1/2023.emnlp-industry.26
Bibkey:
Cite (ACL):
Josiane Van Dorpe, Zachary Yang, Nicolas Grenon-Godbout, and Grégoire Winterstein. 2023. Unveiling Identity Biases in Toxicity Detection : A Game-Focused Dataset and Reactivity Analysis Approach. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 263–274, Singapore. Association for Computational Linguistics.
Cite (Informal):
Unveiling Identity Biases in Toxicity Detection : A Game-Focused Dataset and Reactivity Analysis Approach (Van Dorpe et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-industry.26.pdf
Video:
 https://aclanthology.org/2023.emnlp-industry.26.mp4