Iterative Orthogonal Feature Projection for Diagnosing Bias in Black-Box Models
Julius Adebayo, Lalana Kagal

TL;DR
This paper introduces an iterative orthogonal projection method to interpret black-box models, enabling quantification of input attribute importance to assess potential bias and discrimination.
Contribution
The paper proposes a novel iterative orthogonal projection technique for interpreting black-box models and quantifying input dependence to evaluate fairness.
Findings
Effective in quantifying attribute dependence
Helps identify potential bias in models
Provides a tool for fairness assessment
Abstract
Predictive models are increasingly deployed for the purpose of determining access to services such as credit, insurance, and employment. Despite potential gains in productivity and efficiency, several potential problems have yet to be addressed, particularly the potential for unintentional discrimination. We present an iterative procedure, based on orthogonal projection of input attributes, for enabling interpretability of black-box predictive models. Through our iterative procedure, one can quantify the relative dependence of a black-box model on its input attributes.The relative significance of the inputs to a predictive model can then be used to assess the fairness (or discriminatory extent) of such a model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Ethics and Social Impacts of AI · Law, Economics, and Judicial Systems
MethodsInterpretability
