On Saliency Maps and Adversarial Robustness

Puneet Mangla; Vedant Singh; Vineeth N Balasubramanian

arXiv:2006.07828·cs.CV·July 14, 2020

On Saliency Maps and Adversarial Robustness

Puneet Mangla, Vedant Singh, Vineeth N Balasubramanian

PDF

TL;DR

This paper introduces Saliency-based Adversarial Training (SAT), a novel method that leverages existing dataset annotations as weak saliency maps to enhance the adversarial robustness of models across multiple datasets.

Contribution

The paper proposes SAT, a new approach that uses weak saliency maps from dataset annotations to improve adversarial robustness without extra perturbation generation, and demonstrates its effectiveness empirically.

Findings

01

SAT improves adversarial robustness on multiple datasets.

02

Finer saliency maps lead to more robust models.

03

Combining SAT with existing methods further boosts performance.

Abstract

A Very recent trend has emerged to couple the notion of interpretability and adversarial robustness, unlike earlier efforts which solely focused on good interpretations or robustness against adversaries. Works have shown that adversarially trained models exhibit more interpretable saliency maps than their non-robust counterparts, and that this behavior can be quantified by considering the alignment between input image and saliency map. In this work, we provide a different perspective to this coupling, and provide a method, Saliency based Adversarial training (SAT), to use saliency maps to improve adversarial robustness of a model. In particular, we show that using annotations such as bounding boxes and segmentation masks, already provided with a dataset, as weak saliency maps, suffices to improve adversarial robustness with no additional effort to generate the perturbations themselves.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsInterpretability