Informative Post-Hoc Explanations Only Exist for Simple Functions
Eric G\"unther, Bal\'azs Szabados, Robi Bhattacharjee, Sebastian Bordt, Ulrike von Luxburg

TL;DR
This paper presents a theoretical framework showing that many popular post-hoc explanation algorithms are not informative for complex models, and discusses conditions and modifications for improving their informativeness.
Contribution
It introduces a learning-theory-based framework for evaluating explanation informativeness and rigorously demonstrates limitations of existing algorithms for complex decision functions.
Findings
Many explanation algorithms are non-informative for complex models
Gradient and counterfactual explanations are non-informative for differentiable functions
SHAP and anchor explanations are non-informative for decision trees
Abstract
Many researchers have suggested that local post-hoc explanation algorithms can be used to gain insights into the behavior of complex machine learning models. However, theoretical guarantees about such algorithms only exist for simple decision functions, and it is unclear whether and under which assumptions similar results might exist for complex models. In this paper, we introduce a general, learning-theory-based framework for what it means for an explanation to provide information about a decision function. We call an explanation informative if it serves to reduce the complexity of the space of plausible decision functions. With this approach, we show that many popular explanation algorithms are not informative when applied to complex decision functions, providing a rigorous mathematical rejection of the idea that it should be possible to explain any model. We then derive conditions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
