On the Benefits of Models with Perceptually-Aligned Gradients
Gunjan Aggarwal, Abhishek Sinha, Nupur Kumari, Mayank Singh

TL;DR
This paper demonstrates that models with perceptually-aligned gradients, even if not highly adversarially robust, can be improved for zero-shot and weakly supervised localization tasks through low-perturbation adversarial training.
Contribution
It shows that perceptually-aligned gradients are present in less robust models and that low-perturbation adversarial training enhances their performance in specific tasks.
Findings
Perceptually-aligned gradients exist in models with low adversarial robustness.
Low-perturbation adversarial training improves zero-shot and localization performance.
Models retain interpretability with minimal performance loss.
Abstract
Adversarial robust models have been shown to learn more robust and interpretable features than standard trained models. As shown in [\cite{tsipras2018robustness}], such robust models inherit useful interpretable properties where the gradient aligns perceptually well with images, and adding a large targeted adversarial perturbation leads to an image resembling the target class. We perform experiments to show that interpretable and perceptually aligned gradients are present even in models that do not show high robustness to adversarial attacks. Specifically, we perform adversarial training with attack for different max-perturbation bound. Adversarial training with low max-perturbation bound results in models that have interpretable features with only slight drop in performance over clean samples. In this paper, we leverage models with interpretable perceptually-aligned features and show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
