Intrinsic Bias Metrics Do Not Correlate with Application Bias
Seraphina Goldfarb-Tarrant, Rebecca Marchant, Ricardo Mu\~noz Sanchez,, Mugdha Pandya, Adam Lopez

TL;DR
This paper investigates the relationship between intrinsic and extrinsic bias metrics in NLP models, finding no consistent correlation and emphasizing the importance of extrinsic measures for debiasing efforts.
Contribution
The study provides a comprehensive comparison of intrinsic and extrinsic bias metrics across multiple tasks and languages, highlighting the lack of correlation and proposing a focus on extrinsic measures.
Findings
No reliable correlation between intrinsic and extrinsic bias metrics.
Intrinsic metrics do not consistently predict bias in downstream tasks.
Authors release new intrinsic metric and annotated test set for gender bias in hate speech.
Abstract
Natural Language Processing (NLP) systems learn harmful societal biases that cause them to amplify inequality as they are deployed in more and more situations. To guide efforts at debiasing these systems, the NLP community relies on a variety of metrics that quantify bias in models. Some of these metrics are intrinsic, measuring bias in word embedding spaces, and some are extrinsic, measuring bias in downstream tasks that the word embeddings enable. Do these intrinsic and extrinsic metrics correlate with each other? We compare intrinsic and extrinsic metrics across hundreds of trained models covering different tasks and experimental conditions. Our results show no reliable correlation between these metrics that holds in all scenarios across tasks and languages. We urge researchers working on debiasing to focus on extrinsic measures of bias, and to make using these measures more feasible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
