Analysis of Dominant Classes in Universal Adversarial Perturbations
Jon Vadillo, Roberto Santana, Jose A. Lozano

TL;DR
This paper investigates why universal adversarial perturbations tend to cause most inputs to be misclassified into a single dominant class, providing experimental analysis and explanations in the audio domain.
Contribution
It offers the first comprehensive analysis of the dominant class phenomenon in universal perturbations, proposing hypotheses and new methods to generate such attacks.
Findings
Universal perturbations often lead to a single dominant class.
Proposed hypotheses explain the geometric and data-feature basis of this phenomenon.
New methods for generating universal adversarial attacks are introduced.
Abstract
The reasons why Deep Neural Networks are susceptible to being fooled by adversarial examples remains an open discussion. Indeed, many different strategies can be employed to efficiently generate adversarial attacks, some of them relying on different theoretical justifications. Among these strategies, universal (input-agnostic) perturbations are of particular interest, due to their capability to fool a network independently of the input in which the perturbation is applied. In this work, we investigate an intriguing phenomenon of universal perturbations, which has been reported previously in the literature, yet without a proven justification: universal perturbations change the predicted classes for most inputs into one particular (dominant) class, even if this behavior is not specified during the creation of the perturbation. In order to justify the cause of this phenomenon, we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
