Learning Universal Adversarial Perturbations with Generative Models

Jamie Hayes; George Danezis

arXiv:1708.05207·cs.CR·January 8, 2018

Learning Universal Adversarial Perturbations with Generative Models

Jamie Hayes, George Danezis

PDF

1 Repo

TL;DR

This paper introduces universal adversarial networks, a generative approach to create perturbations that can fool classifiers across datasets, improving the effectiveness of universal adversarial attacks.

Contribution

It presents a novel generative model for universal adversarial perturbations, advancing the ability to generate effective attacks across inputs.

Findings

01

Universal adversarial networks outperform existing attack methods.

02

The approach generalizes well across different classifiers.

03

Generated perturbations significantly increase misclassification rates.

Abstract

Neural networks are known to be vulnerable to adversarial examples, inputs that have been intentionally perturbed to remain visually similar to the source input, but cause a misclassification. It was recently shown that given a dataset and classifier, there exists so called universal adversarial perturbations, a single perturbation that causes a misclassification when applied to any input. In this work, we introduce universal adversarial networks, a generative network that is capable of fooling a target classifier when it's generated output is added to a clean sample from a dataset. We show that this technique improves on known universal adversarial attacks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jhayes14/UAN
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.