Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness
Suraj Srinivas, Sebastian Bordt, Hima Lakkaraju

TL;DR
This paper explains why robust computer vision models have gradients aligned with human perception, attributing it to off-manifold robustness, and explores how different robustness regimes influence this alignment and model accuracy.
Contribution
It provides the first theoretical explanation linking off-manifold robustness to perceptually-aligned gradients and empirically validates this connection across various robust training methods.
Findings
Off-manifold robustness causes gradients to align with data manifolds.
Bayes optimal models inherently satisfy off-manifold robustness.
Different robustness regimes impact perceptual alignment and accuracy.
Abstract
One of the remarkable properties of robust computer vision models is that their input-gradients are often aligned with human perception, referred to in the literature as perceptually-aligned gradients (PAGs). Despite only being trained for classification, PAGs cause robust models to have rudimentary generative capabilities, including image generation, denoising, and in-painting. However, the underlying mechanisms behind these phenomena remain unknown. In this work, we provide a first explanation of PAGs via \emph{off-manifold robustness}, which states that models must be more robust off- the data manifold than they are on-manifold. We first demonstrate theoretically that off-manifold robustness leads input gradients to lie approximately on the data manifold, explaining their perceptual alignment. We then show that Bayes optimal models satisfy off-manifold robustness, and confirm the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
