CID: Measuring Feature Importance Through Counterfactual Distributions
Eddie Conti, \'Alvaro Parafita, Axel Brando

TL;DR
This paper introduces CID, a new post-hoc local feature importance method that uses counterfactual distributions and a rigorous dissimilarity measure to provide more faithful explanations of machine learning models.
Contribution
The paper presents a novel feature importance measure based on counterfactual distributions and a mathematically grounded dissimilarity metric, improving faithfulness over existing methods.
Findings
CID improves faithfulness metrics for explanations
It offers a complementary perspective to existing explainers
The method is validated against established local importance methods
Abstract
Assessing the importance of individual features in Machine Learning is critical to understand the model's decision-making process. While numerous methods exist, the lack of a definitive ground truth for comparison highlights the need for alternative, well-founded measures. This paper introduces a novel post-hoc local feature importance method called Counterfactual Importance Distribution (CID). We generate two sets of positive and negative counterfactuals, model their distributions using Kernel Density Estimation, and rank features based on a distributional dissimilarity measure. This measure, grounded in a rigorous mathematical framework, satisfies key properties required to function as a valid metric. We showcase the effectiveness of our method by comparing with well-established local feature importance explainers. Our method not only offers complementary perspectives to existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
