Finding and Removing Clever Hans: Using Explanation Methods to Debug and   Improve Deep Models

Christopher J. Anders; Leander Weber; David Neumann; Wojciech Samek,; Klaus-Robert M\"uller; Sebastian Lapuschkin

arXiv:1912.11425·cs.CV·December 22, 2020

Finding and Removing Clever Hans: Using Explanation Methods to Debug and Improve Deep Models

Christopher J. Anders, Leander Weber, David Neumann, Wojciech Samek,, Klaus-Robert M\"uller, Sebastian Lapuschkin

PDF

2 Repos

TL;DR

This paper introduces a scalable framework using explanation methods to detect and mitigate Clever Hans behavior in deep vision models, leading to fairer and more reliable AI systems.

Contribution

It proposes Spectral Relevance Analysis for quantifying artifacts and introduces Class Artifact Compensation (ClArC) to reduce Clever Hans predictors in large datasets.

Findings

01

Effective detection of spurious correlations in models.

02

ClArC significantly reduces Clever Hans behavior.

03

Improved model fairness and robustness.

Abstract

Contemporary learning models for computer vision are typically trained on very large (benchmark) datasets with millions of samples. These may, however, contain biases, artifacts, or errors that have gone unnoticed and are exploitable by the model. In the worst case, the trained model does not learn a valid and generalizable strategy to solve the problem it was trained for, and becomes a 'Clever-Hans' (CH) predictor that bases its decisions on spurious correlations in the training data, potentially yielding an unrepresentative or unfair, and possibly even hazardous predictor. In this paper, we contribute by providing a comprehensive analysis framework based on a scalable statistical analysis of attributions from explanation methods for large data corpora. Based on a recent technique - Spectral Relevance Analysis - we propose the following technical contributions and resulting findings:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.