TL;DR
This paper introduces universal adversarial networks, a generative approach to create perturbations that can fool classifiers across datasets, improving the effectiveness of universal adversarial attacks.
Contribution
It presents a novel generative model for universal adversarial perturbations, advancing the ability to generate effective attacks across inputs.
Findings
Universal adversarial networks outperform existing attack methods.
The approach generalizes well across different classifiers.
Generated perturbations significantly increase misclassification rates.
Abstract
Neural networks are known to be vulnerable to adversarial examples, inputs that have been intentionally perturbed to remain visually similar to the source input, but cause a misclassification. It was recently shown that given a dataset and classifier, there exists so called universal adversarial perturbations, a single perturbation that causes a misclassification when applied to any input. In this work, we introduce universal adversarial networks, a generative network that is capable of fooling a target classifier when it's generated output is added to a clean sample from a dataset. We show that this technique improves on known universal adversarial attacks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
