On Saliency Maps and Adversarial Robustness
Puneet Mangla, Vedant Singh, Vineeth N Balasubramanian

TL;DR
This paper introduces Saliency-based Adversarial Training (SAT), a novel method that leverages existing dataset annotations as weak saliency maps to enhance the adversarial robustness of models across multiple datasets.
Contribution
The paper proposes SAT, a new approach that uses weak saliency maps from dataset annotations to improve adversarial robustness without extra perturbation generation, and demonstrates its effectiveness empirically.
Findings
SAT improves adversarial robustness on multiple datasets.
Finer saliency maps lead to more robust models.
Combining SAT with existing methods further boosts performance.
Abstract
A Very recent trend has emerged to couple the notion of interpretability and adversarial robustness, unlike earlier efforts which solely focused on good interpretations or robustness against adversaries. Works have shown that adversarially trained models exhibit more interpretable saliency maps than their non-robust counterparts, and that this behavior can be quantified by considering the alignment between input image and saliency map. In this work, we provide a different perspective to this coupling, and provide a method, Saliency based Adversarial training (SAT), to use saliency maps to improve adversarial robustness of a model. In particular, we show that using annotations such as bounding boxes and segmentation masks, already provided with a dataset, as weak saliency maps, suffices to improve adversarial robustness with no additional effort to generate the perturbations themselves.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsInterpretability
