Right for the Wrong Reason: Can Interpretable ML Techniques Detect Spurious Correlations?
Susu Sun, Lisa M. Koch, Christian F. Baumgartner

TL;DR
This paper evaluates the effectiveness of interpretable machine learning techniques in detecting spurious correlations in neural network models, highlighting the strengths of SHAP and Attri-Net in identifying faulty reasoning.
Contribution
The paper introduces a rigorous evaluation strategy for explanation methods and compares five post-hoc and one inherently interpretable technique on their ability to detect confounders.
Findings
SHAP and Attri-Net reliably identify faulty model behavior
Post-hoc explanations often struggle to detect spurious correlations
Evaluation strategy helps assess explanation techniques' effectiveness
Abstract
While deep neural network models offer unmatched classification performance, they are prone to learning spurious correlations in the data. Such dependencies on confounding information can be difficult to detect using performance metrics if the test data comes from the same distribution as the training data. Interpretable ML methods such as post-hoc explanations or inherently interpretable classifiers promise to identify faulty model reasoning. However, there is mixed evidence whether many of these techniques are actually able to do so. In this paper, we propose a rigorous evaluation strategy to assess an explanation technique's ability to correctly identify spurious correlations. Using this strategy, we evaluate five post-hoc explanation techniques and one inherently interpretable method for their ability to detect three types of artificially added confounders in a chest x-ray diagnosis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Healthcare · Topic Modeling
MethodsShapley Additive Explanations
