The Clever Hans Effect in Unsupervised Learning
Jacob Kauffmann, Jonas Dippel, Lukas Ruff, Wojciech Samek,, Klaus-Robert M\"uller, Gr\'egoire Montavon

TL;DR
This paper investigates the prevalence of the Clever Hans effect in unsupervised learning models, revealing widespread issues through Explainable AI techniques and providing insights into their causes and potential mitigation strategies.
Contribution
It is the first study to empirically demonstrate the Clever Hans effect in unsupervised learning and links it to inductive biases, offering new perspectives on model robustness.
Findings
Clever Hans effects are widespread in unsupervised models.
Explainable AI techniques can identify these effects.
Inductive biases are a primary source of the problem.
Abstract
Unsupervised learning has become an essential building block of AI systems. The representations it produces, e.g. in foundation models, are critical to a wide variety of downstream applications. It is therefore important to carefully examine unsupervised models to ensure not only that they produce accurate predictions, but also that these predictions are not "right for the wrong reasons", the so-called Clever Hans (CH) effect. Using specially developed Explainable AI techniques, we show for the first time that CH effects are widespread in unsupervised learning. Our empirical findings are enriched by theoretical insights, which interestingly point to inductive biases in the unsupervised learning machine as a primary source of CH effects. Overall, our work sheds light on unexplored risks associated with practical applications of unsupervised learning and suggests ways to make unsupervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
