Auditing Predictive Models for Intersectional Biases
Kate S. Boxer, Edward McFowland III, Daniel B. Neill

TL;DR
This paper introduces Conditional Bias Scan (CBS), a novel framework for detecting intersectional biases in predictive models, addressing limitations of aggregate fairness measures by identifying specific subgroups with significant bias.
Contribution
The paper presents CBS, a flexible and powerful auditing method that detects intersectional biases in classification models, incorporating multiple fairness definitions and outperforming existing methods.
Findings
CBS detects previously unidentified intersectional biases in COMPAS.
CBS has higher bias detection power than similar subgroup fairness methods.
The framework can incorporate various fairness definitions for probabilistic and binarized predictions.
Abstract
Predictive models that satisfy group fairness criteria in aggregate for members of a protected class, but do not guarantee subgroup fairness, could produce biased predictions for individuals at the intersection of two or more protected classes. To address this risk, we propose Conditional Bias Scan (CBS), a flexible auditing framework for detecting intersectional biases in classification models. CBS identifies the subgroup for which there is the most significant bias against the protected class, as compared to the equivalent subgroup in the non-protected class, and can incorporate multiple commonly used fairness definitions for both probabilistic and binarized predictions. We show that this methodology can detect previously unidentified intersectional and contextual biases in the COMPAS pre-trial risk assessment tool and has higher bias detection power compared to similar methods that…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
- The method proposed is relatively simple to implement and rigorously grounded in the hypothesis testing literature - The method is flexible for the commonly discussed fairness metrics - The synthetic experiments are designed well to demonstrate the efficacy of the method in different scenarios and metrics
- The methods that are compared against seem to be quite old and I would be interested to see how they compare to newer methods in the literature (e.g. [1]) - More real world dataset studies would improve the study (e.g. folktables [2]) - The writing is verbose at times and could benefit from being more concise. This is especially true in Section 3 when describing the methods. [1] Cherian, John J., and Emmanuel J. Candès. "Statistical inference for fairness auditing." Journal of Machine Learnin
- **[Problem Importance]** The authors study an important problem. - **[Practicality]** The proposed method can accommodate a large number of fairness definitions that prior works are not able to accommodate.
- **[Clarity]** Several important aspects of the paper are not articulated clearly. In particular, I found it difficult to follow the authors’ experimental design, a few examples include: - Section 4: What is a “row”? The authors refer to specific rows or “row $i$” without defining this term. Is the row a particular data point, or is it a row of the covariates? - When defining the true log-odds in their semi-sythetic data the authors say, “We use these weights to produce the true log-o
The main strengths of the paper are: S1) the motivation of the methodology is relevant, as identifying intersectional biases (in a tractable manner) is an open issue in the fairness literature; S2) the empirical evaluation supports the effectiveness of the method; S3) the algorithmic procedure for detecting the most significant subgroup seems novel.
The main shortcomings of the current version of the paper are: W1) I think a few arguments should be taken into account and need further clarification: * in [Ruggieri et al., 2023], the authors show that algorithmic fairness objectives are not compositional, i.e., even if the classifier is fair on some of the regions of the input space, due to the emergence of Yule’s effect, the overall system is not necessarily fair. This could hinder CBS's ability to evaluate the overall fairness of the syst
1. The paper provides a framework for auditing intersectional biases, a crucial area often overlooked in fairness assessments (detection of gerrymandering). 2. The proposed method can accommodate different group fairness metrics and can effectively scan numerous subgroups.
1. The reliability of the estimation of expectations I under the null hypothesis depends on having well-specified models for estimating the propensity scores of the protected class. 2. The paper is quite dense and challenging to follow. It would benefit from providing more intuitive explanations or examples to illustrate why the overall method is effective in real-world scenarios. This would help readers better understand the practical implications and the rationale behind the approach.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRegulation and Compliance Studies · Qualitative Comparative Analysis Research · Gender Politics and Representation
