On the Direction of Discrimination: An Information-Theoretic Analysis of Disparate Impact in Machine Learning
Hao Wang, Berk Ustun, Flavio P. Calmon

TL;DR
This paper introduces an information-theoretic framework to quantify and correct disparate impact in machine learning models, aiming to make output distributions across groups statistically indistinguishable.
Contribution
It proposes a novel method to analyze and mitigate disparate impact using divergence measures and correction functions, with efficient closed-form solutions.
Findings
Framework effectively measures disparate impact.
Correction functions reduce output distribution differences.
Demonstrated on recidivism prediction with COMPAS dataset.
Abstract
In the context of machine learning, disparate impact refers to a form of systematic discrimination whereby the output distribution of a model depends on the value of a sensitive attribute (e.g., race or gender). In this paper, we propose an information-theoretic framework to analyze the disparate impact of a binary classification model. We view the model as a fixed channel, and quantify disparate impact as the divergence in output distributions over two groups. Our aim is to find a correction function that can perturb the input distributions of each group to align their output distributions. We present an optimization problem that can be solved to obtain a correction function that will make the output distributions statistically indistinguishable. We derive closed-form expressions to efficiently compute the correction function, and demonstrate the benefits of our framework on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification
