Salient Information Preserving Adversarial Training Improves Clean and Robust Accuracy
Timothy Redgrave, Adam Czajka

TL;DR
SIP-AT is a novel adversarial training method that preserves salient image regions to improve both clean and robust accuracy, reducing the traditional trade-off in adversarial robustness.
Contribution
This work introduces SIP-AT, a salience-guided adversarial training technique that maintains meaningful features and enhances model performance on clean data without sacrificing robustness.
Findings
SIP-AT boosts clean accuracy across multiple datasets and architectures.
Models trained with SIP-AT maintain high robustness at various attack epsilon levels.
Human studies show increased difficulty in identifying perturbed images at low epsilon levels.
Abstract
In this work we introduce Salient Information Preserving Adversarial Training (SIP-AT), an intuitive method for relieving the robustness-accuracy trade-off incurred by traditional adversarial training. SIP-AT uses salient image regions to guide the adversarial training process in such a way that fragile features deemed meaningful by an annotator remain unperturbed during training, allowing models to learn highly predictive non-robust features without sacrificing overall robustness. This technique is compatible with both human-based and automatically generated salience estimates, allowing SIP-AT to be used as a part of human-driven model development without forcing SIP-AT to be reliant upon additional human data. We perform experiments across multiple datasets and architectures and demonstrate that SIP-AT is able to boost the clean accuracy of models while maintaining a high degree of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning · Neural Networks and Applications
