An Empirical Study of Metrics to Measure Representational Harms in Pre-Trained Language Models

Saghar Hosseini, Hamid Palangi, Ahmed Hassan Awadallah


Abstract
Large-scale Pre-Trained Language Models (PTLMs) capture knowledge from massive human-written data which contains latent societal biases and toxic contents. In this paper, we leverage the primary task of PTLMs, i.e., language modeling, and propose a new metric to quantify manifested implicit representational harms in PTLMs towards 13 marginalized demographics. Using this metric, we conducted an empirical analysis of 24 widely used PTLMs. Our analysis provides insights into the correlation between the proposed metric in this work and other related metrics for representational harm. We observe that our metric correlates with most of the gender-specific metrics in the literature. Through extensive experiments, we explore the connections between PTLMs architectures and representational harms across two dimensions: depth and width of the networks. We found that prioritizing depth over width, mitigates representational harms in some PTLMs. Our code and data can be found at [place holder].
Anthology ID:
2023.trustnlp-1.11
Volume:
Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anaelia Ovalle, Kai-Wei Chang, Ninareh Mehrabi, Yada Pruksachatkun, Aram Galystan, Jwala Dhamala, Apurv Verma, Trista Cao, Anoop Kumar, Rahul Gupta
Venue:
TrustNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
121–134
Language:
URL:
https://aclanthology.org/2023.trustnlp-1.11
DOI:
10.18653/v1/2023.trustnlp-1.11
Bibkey:
Cite (ACL):
Saghar Hosseini, Hamid Palangi, and Ahmed Hassan Awadallah. 2023. An Empirical Study of Metrics to Measure Representational Harms in Pre-Trained Language Models. In Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023), pages 121–134, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
An Empirical Study of Metrics to Measure Representational Harms in Pre-Trained Language Models (Hosseini et al., TrustNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.trustnlp-1.11.pdf
Video:
 https://aclanthology.org/2023.trustnlp-1.11.mp4