Post hoc Explanations may be Ineffective for Detecting Unknown Spurious Correlation
Julius Adebayo, Michael Muelly, Hal Abelson, Been Kim

TL;DR
This paper evaluates the effectiveness of post hoc explanation methods in detecting unknown spurious correlations in models, revealing their limitations especially when the spurious signals are not visually obvious.
Contribution
It introduces an empirical methodology and metrics to assess explanation reliability for detecting unknown spurious signals, highlighting their ineffectiveness in certain scenarios.
Findings
Post hoc explanations are ineffective for unknown spurious signals.
Feature attribution can falsely indicate reliance on spurious signals.
Methods struggle with non-visible artifacts like background blur.
Abstract
We investigate whether three types of post hoc model explanations--feature attribution, concept activation, and training point ranking--are effective for detecting a model's reliance on spurious signals in the training data. Specifically, we consider the scenario where the spurious signal to be detected is unknown, at test-time, to the user of the explanation method. We design an empirical methodology that uses semi-synthetic datasets along with pre-specified spurious artifacts to obtain models that verifiably rely on these spurious training signals. We then provide a suite of metrics that assess an explanation method's reliability for spurious signal detection under various conditions. We find that the post hoc explanation methods tested are ineffective when the spurious artifact is unknown at test-time especially for non-visible artifacts like a background blur. Further, we find that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications · Cell Image Analysis Techniques
MethodsHigh-Order Consensuses
