Mitigating Clever Hans Strategies in Image Classifiers through Generating Counterexamples
Sidney Bender, Ole Delzer, Jan Herrmann, Heike Antje Marxfeld, Klaus-Robert M\"uller, Gr\'egoire Montavon

TL;DR
This paper introduces CFKD, a novel framework that generates counterfactuals to improve robustness of image classifiers against spurious correlations without needing group labels.
Contribution
CFKD enables robust model training by generating diverse counterfactuals, enriching data, and correcting decision boundaries without confounder labels, scalable to multiple confounders.
Findings
CFKD outperforms existing methods on five datasets.
It achieves balanced generalization across groups.
Particularly effective in low-data regimes with spurious correlations.
Abstract
Deep learning models remain vulnerable to spurious correlations, leading to so-called Clever Hans predictors that undermine robustness even in large-scale foundation and self-supervised models. Group distributional robustness methods, such as Deep Feature Reweighting (DFR) rely on explicit group labels to upweight underrepresented subgroups, but face key limitations: (1) group labels are often unavailable, (2) low within-group sample sizes hinder coverage of the subgroup distribution, and (3) performance degrades sharply when multiple spurious correlations fragment the data into even smaller groups. We propose Counterfactual Knowledge Distillation (CFKD), a framework that sidesteps these issues by generating diverse counterfactuals, enabling a human annotator to efficiently explore and correct the model's decision boundaries through a knowledge distillation step. Unlike DFR, our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
