Reproducibility study on how to find Spurious Correlations, Shortcut Learning, Clever Hans or Group-Distributional non-robustness and how to fix them
Ole Delzer, Sidney Bender

TL;DR
This study compares various methods to detect and fix spurious correlations in deep neural networks, highlighting the effectiveness of explainable AI techniques and the challenges in practical deployment due to data and annotation limitations.
Contribution
It unifies diverse perspectives on model robustness, evaluates correction methods under challenging conditions, and identifies key limitations in current approaches.
Findings
XAI-based methods outperform non-XAI approaches
Counterfactual Knowledge Distillation (CFKD) is most effective
Manual group annotation is often infeasible and hampers method application
Abstract
Deep Neural Networks (DNNs) are increasingly utilized in high-stakes domains like medical diagnostics and autonomous driving where model reliability is critical. However, the research landscape for ensuring this reliability is terminologically fractured across communities that pursue the same goal of ensuring models rely on causally relevant features rather than confounding signals. While frameworks such as distributionally robust optimization (DRO), invariant risk minimization (IRM), shortcut learning, simplicity bias, and the Clever Hans effect all address model failure due to spurious correlations, researchers typically only reference work within their own domains. This reproducibility study unifies these perspectives through a comparative analysis of correction methods under challenging constraints like limited data availability and severe subgroup imbalance. We evaluate recently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
