Using Artificial French Data to Understand the Emergence of Gender Bias in Transformer Language Models

Lina Conti, Guillaume Wisniewski


Abstract
Numerous studies have demonstrated the ability of neural language models to learn various linguistic properties without direct supervision. This work takes an initial step towards exploring the less researched topic of how neural models discover linguistic properties of words, such as gender, as well as the rules governing their usage. We propose to use an artificial corpus generated by a PCFG based on French to precisely control the gender distribution in the training data and determine under which conditions a model correctly captures gender information or, on the contrary, appears gender-biased.
Anthology ID:
2023.emnlp-main.641
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10362–10371
Language:
URL:
https://aclanthology.org/2023.emnlp-main.641
DOI:
10.18653/v1/2023.emnlp-main.641
Bibkey:
Cite (ACL):
Lina Conti and Guillaume Wisniewski. 2023. Using Artificial French Data to Understand the Emergence of Gender Bias in Transformer Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10362–10371, Singapore. Association for Computational Linguistics.
Cite (Informal):
Using Artificial French Data to Understand the Emergence of Gender Bias in Transformer Language Models (Conti & Wisniewski, EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.641.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.641.mp4