TL;DR
This paper identifies biases in common metrics used to measure model performance disparities across groups and introduces a simple, unbiased variance estimator to improve the accuracy of fairness assessments.
Contribution
The paper reveals statistical biases in existing disparity metrics and proposes a novel, easy-to-implement double-corrected variance estimator for unbiased measurement.
Findings
Existing disparity metrics are often biased estimators.
The proposed estimator provides unbiased variance estimates.
Applying the method changes significance conclusions in real data.
Abstract
When a model's performance differs across socially or culturally relevant groups--like race, gender, or the intersections of many such groups--it is often called "biased." While much of the work in algorithmic fairness over the last several years has focused on developing various definitions of model fairness (the absence of group-wise model performance disparities) and eliminating such "bias," much less work has gone into rigorously measuring it. In practice, it important to have high quality, human digestible measures of model performance disparities and associated uncertainty quantification about them that can serve as inputs into multi-faceted decision-making processes. In this paper, we show both mathematically and through simulation that many of the metrics used to measure group-wise model performance disparities are themselves statistically biased estimators of the underlying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
