Counterfactual Image Generation for adversarially robust and   interpretable Classifiers

Rafael Bischof; Florian Scheidegger; Michael A. Kraus; A. Cristiano I.; Malossi

arXiv:2310.00761·cs.CV·October 3, 2023

Counterfactual Image Generation for adversarially robust and interpretable Classifiers

Rafael Bischof, Florian Scheidegger, Michael A. Kraus, A. Cristiano I., Malossi

PDF

Open Access

TL;DR

This paper introduces a unified GAN-based framework that generates counterfactual images to improve both interpretability and adversarial robustness of neural image classifiers, addressing two issues simultaneously.

Contribution

The paper presents a novel GAN approach that produces counterfactual samples for interpretability and robustness, combining explainability and adversarial training in a single model.

Findings

01

High-quality saliency maps with competitive IoU scores

02

Enhanced robustness against PGD adversarial attacks

03

Discriminator's 'fakeness' as an uncertainty measure

Abstract

Neural Image Classifiers are effective but inherently hard to interpret and susceptible to adversarial attacks. Solutions to both problems exist, among others, in the form of counterfactual examples generation to enhance explainability or adversarially augment training datasets for improved robustness. However, existing methods exclusively address only one of the issues. We propose a unified framework leveraging image-to-image translation Generative Adversarial Networks (GANs) to produce counterfactual samples that highlight salient regions for interpretability and act as adversarial samples to augment the dataset for more robustness. This is achieved by combining the classifier and discriminator into a single model that attributes real images to their respective classes and flags generated images as "fake". We assess the method's effectiveness by evaluating (i) the produced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning