Are Perceptually-Aligned Gradients a General Property of Robust Classifiers?
Simran Kaur, Jeremy Cohen, Zachary C. Lipton

TL;DR
This paper investigates whether perceptually-aligned gradients are a common feature of robust classifiers, finding evidence that they occur not only in adversarially-trained models but also in classifiers made robust through randomized smoothing.
Contribution
The study demonstrates that perceptually-aligned gradients are likely a general property of robust classifiers, extending previous findings beyond adversarial training to randomized smoothing methods.
Findings
Perceptually-aligned gradients occur in adversarially-trained neural networks.
Randomized smoothing classifiers also exhibit perceptually-aligned gradients.
Supports the hypothesis that such gradients are a general property of robust models.
Abstract
For a standard convolutional neural network, optimizing over the input pixels to maximize the score of some target class will generally produce a grainy-looking version of the original image. However, Santurkar et al. (2019) demonstrated that for adversarially-trained neural networks, this optimization produces images that uncannily resemble the target class. In this paper, we show that these "perceptually-aligned gradients" also occur under randomized smoothing, an alternative means of constructing adversarially-robust classifiers. Our finding supports the hypothesis that perceptually-aligned gradients may be a general property of robust classifiers. We hope that our results will inspire research aimed at explaining this link between perceptually-aligned gradients and adversarial robustness.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Face and Expression Recognition
