TL;DR
This paper introduces a scalable framework using explanation methods to detect and mitigate Clever Hans behavior in deep vision models, leading to fairer and more reliable AI systems.
Contribution
It proposes Spectral Relevance Analysis for quantifying artifacts and introduces Class Artifact Compensation (ClArC) to reduce Clever Hans predictors in large datasets.
Findings
Effective detection of spurious correlations in models.
ClArC significantly reduces Clever Hans behavior.
Improved model fairness and robustness.
Abstract
Contemporary learning models for computer vision are typically trained on very large (benchmark) datasets with millions of samples. These may, however, contain biases, artifacts, or errors that have gone unnoticed and are exploitable by the model. In the worst case, the trained model does not learn a valid and generalizable strategy to solve the problem it was trained for, and becomes a 'Clever-Hans' (CH) predictor that bases its decisions on spurious correlations in the training data, potentially yielding an unrepresentative or unfair, and possibly even hazardous predictor. In this paper, we contribute by providing a comprehensive analysis framework based on a scalable statistical analysis of attributions from explanation methods for large data corpora. Based on a recent technique - Spectral Relevance Analysis - we propose the following technical contributions and resulting findings:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
