PredDiff: Explanations and Interactions from Conditional Expectations
Stefan Bl\"ucher, Johanna Vielhaben, Nils Strodthoff

TL;DR
PredDiff is a theoretically grounded, model-agnostic method for local feature attribution and interaction detection, extending existing approaches with a new measure for feature interactions, applicable to both classification and regression tasks.
Contribution
The paper introduces a new, well-founded interaction measure within PredDiff, enhancing its ability to explain complex models and distinguish between classification and regression.
Findings
PredDiff provides reliable, numerically inexpensive attributions.
The new interaction measure captures complex feature interactions.
PredDiff's connection to Shapley values clarifies its theoretical foundation.
Abstract
PredDiff is a model-agnostic, local attribution method that is firmly rooted in probability theory. Its simple intuition is to measure prediction changes while marginalizing features. In this work, we clarify properties of PredDiff and its close connection to Shapley values. We stress important differences between classification and regression, which require a specific treatment within both formalisms. We extend PredDiff by introducing a new, well-founded measure for interaction effects between arbitrary feature subsets. The study of interaction effects represents an inevitable step towards a comprehensive understanding of black-box models and is particularly important for science applications. Equipped with our novel interaction measure, PredDiff is a promising model-agnostic approach for obtaining reliable, numerically inexpensive and theoretically sound attributions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
