Quantifying Feature Contributions to Overall Disparity Using Information Theory
Sanghamitra Dutta, Praveen Venkatesh, Pulkit Grover

TL;DR
This paper introduces an information-theoretic approach to quantify the contribution of individual features to observed disparities in machine learning decisions, especially when model access or intervention is limited.
Contribution
It proposes a novel distributional method using Partial Information Decomposition to measure feature contributions without requiring model access or interventions.
Findings
Illustrated the difference between distributional and interventional explanations
Developed a method to quantify redundant information related to disparities
Applied the technique in a case study to demonstrate its utility
Abstract
When a machine-learning algorithm makes biased decisions, it can be helpful to understand the sources of disparity to explain why the bias exists. Towards this, we examine the problem of quantifying the contribution of each individual feature to the observed disparity. If we have access to the decision-making model, one potential approach (inspired from intervention-based approaches in explainability literature) is to vary each individual feature (while keeping the others fixed) and use the resulting change in disparity to quantify its contribution. However, we may not have access to the model or be able to test/audit its outputs for individually varying features. Furthermore, the decision may not always be a deterministic function of the input features (e.g., with human-in-the-loop). For these situations, we might need to explain contributions using purely distributional (i.e.,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Bayesian Modeling and Causal Inference
