TL;DR
This paper introduces a method for creating natural-looking adversarial examples using generative models, which can fool classifiers in both image and audio domains by mimicking real objects or sounds.
Contribution
It presents a systematic approach employing generative adversarial networks and placement algorithms to produce natural adversarial perturbations that effectively deceive classifiers.
Findings
Generative patches can fool image classifiers.
Audio perturbations resembling natural sounds can deceive speech models.
The method is fast and adaptable to different data modalities.
Abstract
In adversarial attacks intended to confound deep learning models, most studies have focused on limiting the magnitude of the modification so that humans do not notice the attack. On the other hand, during an attack against autonomous cars, for example, most drivers would not find it strange if a small insect image were placed on a stop sign, or they may overlook it. In this paper, we present a systematic approach to generate natural adversarial examples against classification models by employing such natural-appearing perturbations that imitate a certain object or signal. We first show the feasibility of this approach in an attack against an image classifier by employing generative adversarial networks that produce image patches that have the appearance of a natural object to fool the target model. We also introduce an algorithm to optimize placement of the perturbation in accordance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
