Measures of classification bias derived from sample size analysis
Ioannis Ivrissimtzis, Shauna Concannon, Matthew Houliston, Graham Roberts

TL;DR
This paper introduces a new bias measure for classifiers based on the sample size needed to detect demographic error rate differences, offering advantages over traditional metrics.
Contribution
It proposes a novel bias measure rooted in sample size analysis, providing a different ranking of algorithms and potential applicability to complex demographic settings.
Findings
The measure correlates bias with sample size requirements for significance.
It differs from error difference and ratio metrics in bias ranking.
The measure has desirable properties stemming from fundamental statistical principles.
Abstract
We propose the use of a simple intuitive principle for measuring algorithmic classification bias: the significance of the differences in a classifier's error rates across the various demographics is inversely commensurate with the sample size required to statistically detect them. That is, if large sample sizes are required to statistically establish biased behavior, the algorithm is less biased, and vice versa. In a simple setting, we assume two distinct demographics, and non-parametric estimates of the error rates on them, e1 and e2, respectively. We use a well-known approximate formula for the sample size of the chi-squared test, and verify some basic desirable properties of the proposed measure. Next, we compare the proposed measure with two other commonly used statistics, the difference e2-e1 and the ratio e2/e1 of the error rates. We establish that the proposed measure is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Imbalanced Data Classification Techniques · Advanced Statistical Methods and Models
