Do Fair Models Reason Fairly? Counterfactual Explanation Consistency for Procedural Fairness in Credit Decisions
Gideon Popoola, John Sheppard

TL;DR
This paper introduces Counterfactual Explanation Consistency (CEC), a framework to detect and reduce hidden procedural bias in fair credit decision models by aligning reasoning across individuals and their counterfactuals.
Contribution
It proposes a novel CEC framework, including methods for counterfactual generation, a fairness metric, and training loss, addressing hidden biases not captured by outcome fairness metrics.
Findings
Outcome-fair models can still apply different reasoning, leading to hidden bias.
CEC significantly reduces hidden bias in models with modest utility loss.
Experiments on multiple datasets validate the effectiveness of CEC in detecting and mitigating bias.
Abstract
Machine learning algorithms in socially sensitive domains (e.g., credit decisions) often focus on equalizing predictive outcomes. However, satisfying these metrics does not guarantee that models use the same reasoning for different groups. We show that existing outcome-fair models can still apply fundamentally different reasoning to individuals, a ``hidden procedural bias'' missed by standard fairness metrics and algorithms. We propose Counterfactual Explanation Consistency (CEC), a framework that detects and mitigates this bias by aligning feature attributions between individuals and their counterfactual counterparts. Key contributions include a nearest-neighbor counterfactual generation method, a modified baseline for integrated gradient comparisons, an individual-level procedural fairness metric, and a corresponding training loss. We introduce a taxonomy identifying ``Regime B''…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
