Fairness of Explanations in Artificial Intelligence (AI): A Unifying Framework, Axioms, and Future Direction toward Responsible AI
Gideon Popoola, John Sheppard

TL;DR
This paper unifies the study of fairness and explainability in AI, highlighting the importance of explanation fairness and proposing a theoretical framework and evaluation workflow.
Contribution
It introduces a conditional invariance framework for explanation fairness, a taxonomy of explanation inequity, and a practical evaluation workflow.
Findings
Identifies procedural bias as a gap between fairness and explanation.
Proposes a formal framework for explanation fairness based on conditional invariance.
Provides a taxonomy and mechanisms for understanding explanation inequity.
Abstract
Machine learning algorithms are being used in high-stakes decisions, including those in criminal justice, healthcare, credit, and employment. The research community has responded with two largely independent research fields: \emph{algorithmic fairness}, which targets equitable outcomes, and \emph{explainable AI} (XAI), which targets interpretable reasoning. This survey identifies and maps a novel blind spot at their intersection, which is a model that can satisfy every standard fairness criterion in its outputs while being profoundly unfair in its \emph{reasoning process}. We refer to this as the procedural bias, and mitigating it requires treating the fairness of explanations as a distinct object of scientific study. To our knowledge, we provide the first unified theoretical and literature review of this emerging field and elucidate the drawbacks of post-hoc explainers in certifying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
