Evaluating Model Bias Requires Characterizing its Mistakes
Isabela Albuquerque, Jessica Schrouff, David Warde-Farley, Taylan, Cemgil, Sven Gowal, and Olivia Wiles

TL;DR
This paper introduces SkewSize, a new metric inspired by hypothesis testing, to better characterize and quantify model biases by analyzing mistakes across subgroups, revealing biases overlooked by traditional metrics.
Contribution
The paper proposes SkewSize, a flexible and principled metric for characterizing model bias through mistake analysis, applicable to multi-class and generative models, improving bias detection.
Findings
SkewSize uncovers biases not detected by existing metrics.
It effectively highlights biases in vision and vision-language models.
SkewSize provides insights into the impact of bias mitigation techniques.
Abstract
The ability to properly benchmark model performance in the face of spurious correlations is important to both build better predictors and increase confidence that models are operating as intended. We demonstrate that characterizing (as opposed to simply quantifying) model mistakes across subgroups is pivotal to properly reflect model biases, which are ignored by standard metrics such as worst-group accuracy or accuracy gap. Inspired by the hypothesis testing framework, we introduce SkewSize, a principled and flexible metric that captures bias from mistakes in a model's predictions. It can be used in multi-class settings or generalised to the open vocabulary setting of generative models. SkewSize is an aggregation of the effect size of the interaction between two categorical variables: the spurious variable representing the bias attribute and the model's prediction. We demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Safety Analysis · Complex Systems and Decision Making
