Fairwashing Explanations with Off-Manifold Detergent

Christopher J. Anders; Plamen Pasliev; Ann-Kathrin Dombrowski,; Klaus-Robert M\"uller; Pan Kessel

arXiv:2007.09969·cs.LG·July 21, 2020·35 cites

Fairwashing Explanations with Off-Manifold Detergent

Christopher J. Anders, Plamen Pasliev, Ann-Kathrin Dombrowski,, Klaus-Robert M\"uller, Pan Kessel

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper demonstrates that current explanation methods can be manipulated without changing classifier behavior, and proposes a modification to improve their robustness against such manipulations.

Contribution

The paper provides a theoretical and experimental analysis showing the manipulability of explanation maps and introduces a robustification method for existing explanation techniques.

Findings

01

Explanation maps can be arbitrarily manipulated without affecting classifier performance.

02

Current explanation methods are vulnerable to off-manifold manipulations.

03

A proposed modification enhances the robustness of explanation methods.

Abstract

Explanation methods promise to make black-box classifiers more transparent. As a result, it is hoped that they can act as proof for a sensible, fair and trustworthy decision-making process of the algorithm and thereby increase its acceptance by the end-users. In this paper, we show both theoretically and experimentally that these hopes are presently unfounded. Specifically, we show that, for any classifier $g$ , one can always construct another classifier $\tilde{g}$ which has the same behavior on the data (same train, validation, and test error) but has arbitrarily manipulated explanation maps. We derive this statement theoretically using differential geometry and demonstrate it experimentally for various explanation methods, architectures, and datasets. Motivated by our theoretical insights, we then propose a modification of existing explanation methods which makes them significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fairwashing/fairwashing
pytorchOfficial

Videos

Fairwashing explanations with off-manifold detergent· slideslive

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification