Explaining Classifiers using Adversarial Perturbations on the Perceptual   Ball

Andrew Elliott; Stephen Law; Chris Russell

arXiv:1912.09405·cs.CV·April 1, 2021

Explaining Classifiers using Adversarial Perturbations on the Perceptual Ball

Andrew Elliott, Stephen Law, Chris Russell

PDF

TL;DR

This paper introduces a perceptually regularized adversarial perturbation method that produces semi-sparse, semantically meaningful explanations highlighting objects in images, bridging counterfactual explanations and adversarial attacks.

Contribution

It proposes a novel perceptual loss-based regularization for adversarial perturbations that enhances interpretability by focusing on relevant image regions.

Findings

01

Effective in weak localization benchmarks

02

Improves insertion and deletion metrics

03

Enhances pointing game performance

Abstract

We present a simple regularization of adversarial perturbations based upon the perceptual loss. While the resulting perturbations remain imperceptible to the human eye, they differ from existing adversarial perturbations in that they are semi-sparse alterations that highlight objects and regions of interest while leaving the background unaltered. As a semantically meaningful adverse perturbations, it forms a bridge between counterfactual explanations and adversarial perturbations in the space of images. We evaluate our approach on several standard explainability benchmarks, namely, weak localization, insertion deletion, and the pointing game demonstrating that perceptually regularized counterfactuals are an effective explanation for image-based classifiers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsCounterfactuals Explanations