Trustworthy Social Bias Measurement
Rishi Bommasani, Percy Liang

TL;DR
This paper introduces a new framework for measuring social bias in NLP that is grounded in social science principles, rigorously validated, and designed to be trustworthy and reliable.
Contribution
It proposes DivDist, a general bias measurement framework with five concrete measures, validated through a comprehensive testing protocol.
Findings
Measures predict biases in US employment.
Overcome deficiencies of prior bias measures.
Demonstrate conceptual, technical, and empirical robustness.
Abstract
How do we design measures of social bias that we trust? While prior work has introduced several measures, no measure has gained widespread trust: instead, mounting evidence argues we should distrust these measures. In this work, we design bias measures that warrant trust based on the cross-disciplinary theory of measurement modeling. To combat the frequently fuzzy treatment of social bias in NLP, we explicitly define social bias, grounded in principles drawn from social science research. We operationalize our definition by proposing a general bias measurement framework DivDist, which we use to instantiate 5 concrete bias measures. To validate our measures, we propose a rigorous testing protocol with 8 testing criteria (e.g. predictive validity: do measures predict biases in US employment?). Through our testing, we demonstrate considerable evidence to trust our measures, showing they…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Experimental Behavioral Economics Studies · Social Media and Politics
