Explainability for fair machine learning
Tom Begley, Tobias Schwedes, Christopher Frye, Ilya Feige

TL;DR
This paper introduces a Shapley value-based explainability method for assessing and understanding fairness in machine learning models, even when sensitive attributes are not directly used, and proposes a meta algorithm for fairness interventions.
Contribution
It presents a novel Shapley value approach to explain model fairness and a meta algorithm for training fairness interventions without sacrificing performance.
Findings
The method effectively attributes unfairness to input features.
The meta algorithm improves fairness with no performance loss.
Explains trade-offs between accuracy and fairness.
Abstract
As the decisions made or influenced by machine learning models increasingly impact our lives, it is crucial to detect, understand, and mitigate unfairness. But even simply determining what "unfairness" should mean in a given context is non-trivial: there are many competing definitions, and choosing between them often requires a deep understanding of the underlying task. It is thus tempting to use model explainability to gain insights into model fairness, however existing explainability tools do not reliably indicate whether a model is indeed fair. In this work we present a new approach to explaining fairness in machine learning, based on the Shapley value paradigm. Our fairness explanations attribute a model's overall unfairness to individual input features, even in cases where the model does not operate on sensitive attributes directly. Moreover, motivated by the linearity of Shapley…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Adversarial Robustness in Machine Learning
