The Intriguing Relation Between Counterfactual Explanations and   Adversarial Examples

Timo Freiesleben

arXiv:2009.05487·cs.AI·November 3, 2021

The Intriguing Relation Between Counterfactual Explanations and Adversarial Examples

Timo Freiesleben

PDF

TL;DR

This paper explores the relationship between counterfactual explanations and adversarial examples, highlighting their differences and similarities, and proposes a unified mathematical framework to analyze both concepts.

Contribution

It introduces a formal framework distinguishing CEs and AEs based on label relevance and proximity, and analyzes their interconnectedness in current methods.

Findings

01

CEs and AEs can be generated using similar techniques.

02

Differences in label relevance and proximity distinguish CEs from AEs.

03

The fields of CEs and AEs are likely to converge as their applications overlap.

Abstract

The same method that creates adversarial examples (AEs) to fool image-classifiers can be used to generate counterfactual explanations (CEs) that explain algorithmic decisions. This observation has led researchers to consider CEs as AEs by another name. We argue that the relationship to the true label and the tolerance with respect to proximity are two properties that formally distinguish CEs and AEs. Based on these arguments, we introduce CEs, AEs, and related concepts mathematically in a common framework. Furthermore, we show connections between current methods for generating CEs and AEs, and estimate that the fields will merge more and more as the number of common use-cases grows.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.