Distribution-Based Feature Attribution for Explaining the Predictions of Any Classifier
Xinpeng Li, Kai Ming Ting

TL;DR
This paper introduces a formal problem definition for feature attribution in AI models and proposes DFAX, a novel distribution-based method that provides more effective and efficient explanations for classifier predictions.
Contribution
The paper formalizes the feature attribution problem and introduces DFAX, the first method to explain predictions based on the data distribution, addressing limitations of existing approaches.
Findings
DFAX outperforms state-of-the-art baselines in effectiveness.
DFAX is more efficient than existing methods.
Many existing methods do not meet the formal criteria for feature attribution.
Abstract
The proliferation of complex, black-box AI models has intensified the need for techniques that can explain their decisions. Feature attribution methods have become a popular solution for providing post-hoc explanations, yet the field has historically lacked a formal problem definition. This paper addresses this gap by introducing a formal definition for the problem of feature attribution, which stipulates that explanations be supported by an underlying probability distribution represented by the given dataset. Our analysis reveals that many existing model-agnostic methods fail to meet this criterion, while even those that do often possess other limitations. To overcome these challenges, we propose Distributional Feature Attribution eXplanations (DFAX), a novel, model-agnostic method for feature attribution. DFAX is the first feature attribution method to explain classifier predictions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Bayesian Modeling and Causal Inference
