Post hoc Explanations may be Ineffective for Detecting Unknown Spurious   Correlation

Julius Adebayo; Michael Muelly; Hal Abelson; Been Kim

arXiv:2212.04629·cs.LG·December 12, 2022·24 cites

Post hoc Explanations may be Ineffective for Detecting Unknown Spurious Correlation

Julius Adebayo, Michael Muelly, Hal Abelson, Been Kim

PDF

Open Access 1 Video

TL;DR

This paper evaluates the effectiveness of post hoc explanation methods in detecting unknown spurious correlations in models, revealing their limitations especially when the spurious signals are not visually obvious.

Contribution

It introduces an empirical methodology and metrics to assess explanation reliability for detecting unknown spurious signals, highlighting their ineffectiveness in certain scenarios.

Findings

01

Post hoc explanations are ineffective for unknown spurious signals.

02

Feature attribution can falsely indicate reliance on spurious signals.

03

Methods struggle with non-visible artifacts like background blur.

Abstract

We investigate whether three types of post hoc model explanations--feature attribution, concept activation, and training point ranking--are effective for detecting a model's reliance on spurious signals in the training data. Specifically, we consider the scenario where the spurious signal to be detected is unknown, at test-time, to the user of the explanation method. We design an empirical methodology that uses semi-synthetic datasets along with pre-specified spurious artifacts to obtain models that verifiably rely on these spurious training signals. We then provide a suite of metrics that assess an explanation method's reliability for spurious signal detection under various conditions. We find that the post hoc explanation methods tested are ineffective when the spurious artifact is unknown at test-time especially for non-visible artifacts like a background blur. Further, we find that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Post hoc Explanations may be Ineffective for Detecting Unknown Spurious Correlation· slideslive

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications · Cell Image Analysis Techniques

MethodsHigh-Order Consensuses