TL;DR
This paper introduces UNQOVER, a framework for accurately measuring stereotyping biases in question answering models using underspecified questions, revealing biases across models and the impact of size and fine-tuning.
Contribution
The paper presents UNQOVER, a novel formalism to isolate reasoning errors in bias measurement and applies it to analyze biases in multiple transformer-based QA models.
Findings
All models exhibit notable stereotypes across classes.
Larger models tend to have higher bias.
Fine-tuning effects on bias vary with dataset and model size.
Abstract
While language embeddings have been shown to have stereotyping biases, how these biases affect downstream question answering (QA) models remains unexplored. We present UNQOVER, a general framework to probe and quantify biases through underspecified questions. We show that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors: positional dependence and question independence. We design a formalism that isolates the aforementioned errors. As case studies, we use this metric to analyze four important classes of stereotypes: gender, nationality, ethnicity, and religion. We probe five transformer-based QA models trained on two QA datasets, along with their underlying language models. Our broad study reveals that (1) all these models, with and without fine-tuning, have notable stereotyping biases in these classes; (2) larger models often have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
