Origins of Low-dimensional Adversarial Perturbations
Elvis Dohmatob, Chuan Guo, Morgane Goibert

TL;DR
This paper provides a theoretical analysis of low-dimensional adversarial perturbations, explaining their effectiveness and limitations in classification models, especially in relation to the model's margin and input gradients.
Contribution
It introduces rigorous bounds for fooling rates of low-dimensional adversarial perturbations and explains their success through theoretical analysis and experiments.
Findings
Fooling rate depends on model margin and subspace alignment.
Universal adversarial perturbations can be smaller in norm than typical data points.
Theoretical bounds are validated by experiments on synthetic and real data.
Abstract
In this paper, we initiate a rigorous study of the phenomenon of low-dimensional adversarial perturbations (LDAPs) in classification. Unlike the classical setting, these perturbations are limited to a subspace of dimension which is much smaller than the dimension of the feature space. The case corresponds to so-called universal adversarial perturbations (UAPs; Moosavi-Dezfooli et al., 2017). First, we consider binary classifiers under generic regularity conditions (including ReLU networks) and compute analytical lower-bounds for the fooling rate of any subspace. These bounds explicitly highlight the dependence of the fooling rate on the pointwise margin of the model (i.e., the ratio of the output to its norm of its gradient at a test point), and on the alignment of the given subspace with the gradients of the model w.r.t. inputs. Our results provide a rigorous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis
