Towards Fixing Clever-Hans Predictors with Counterfactual Knowledge Distillation
Sidney Bender, Christopher J. Anders, Pattarawatt Chormai, Heike, Marxfeld, Jan Herrmann, Gr\'egoire Montavon

TL;DR
This paper presents counterfactual knowledge distillation (CFKD), a technique leveraging human feedback to identify and eliminate confounders in deep learning models, improving reliability especially in safety-critical applications.
Contribution
It introduces CFKD, a novel method for removing confounder reliance in models using counterfactual explanations and human feedback, with a new evaluation metric for true test performance.
Findings
CFKD effectively reduces confounder reliance in synthetic datasets.
CFKD improves model robustness on real-world histopathological data.
A new metric better correlates with true test performance than validation accuracy.
Abstract
This paper introduces a novel technique called counterfactual knowledge distillation (CFKD) to detect and remove reliance on confounders in deep learning models with the help of human expert feedback. Confounders are spurious features that models tend to rely on, which can result in unexpected errors in regulated or safety-critical domains. The paper highlights the benefit of CFKD in such domains and shows some advantages of counterfactual explanations over other types of explanations. We propose an experiment scheme to quantitatively evaluate the success of CFKD and different teachers that can give feedback to the model. We also introduce a new metric that is better correlated with true test performance than validation accuracy. The paper demonstrates the effectiveness of CFKD on synthetically augmented datasets and on real-world histopathological datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Healthcare · Machine Learning and Data Classification
MethodsKnowledge Distillation
