Marrying Fairness and Explainability in Supervised Learning
Przemyslaw Grabowicz, Nicholas Perello, Aarshee Mishra

TL;DR
This paper explores the relationship between fairness and explainability in supervised learning, formalizes types of discrimination, and proposes post-processing methods to reduce discrimination while maintaining accuracy.
Contribution
It introduces formal measures for direct and induced discrimination, and develops post-processing techniques to mitigate discrimination without sacrificing model performance.
Findings
State-of-the-art fair learning methods can induce discrimination.
Proposed post-processing methods effectively prevent direct discrimination.
Methods maintain high model accuracy and reduce disparity measures.
Abstract
Machine learning algorithms that aid human decision-making may inadvertently discriminate against certain protected groups. We formalize direct discrimination as a direct causal effect of the protected attributes on the decisions, while induced discrimination as a change in the causal influence of non-protected features associated with the protected attributes. The measurements of marginal direct effect (MDE) and SHapley Additive exPlanations (SHAP) reveal that state-of-the-art fair learning methods can induce discrimination via association or reverse discrimination in synthetic and real-world datasets. To inhibit discrimination in algorithmic systems, we propose to nullify the influence of the protected attribute on the output of the system, while preserving the influence of remaining features. We introduce and study post-processing methods achieving such objectives, finding that they…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
