Adversarial examples by perturbing high-level features in intermediate decoder layers
Vojt\v{e}ch \v{C}erm\'ak, Luk\'a\v{s} Adam

TL;DR
This paper introduces a novel adversarial attack method that perturbs high-level features in intermediate decoder layers of generative models, producing semantically meaningful adversarial images that are more robust against defenses.
Contribution
It presents a new approach to generate adversarial examples by perturbing intermediate decoder features, leveraging Wasserstein distance optimization, and demonstrates its effectiveness on MNIST and ImageNet datasets.
Findings
Adversarial images are less vulnerable to steganographic defenses.
The method modifies key features like edges and colors.
Defense techniques based on adversarial training are vulnerable.
Abstract
We propose a novel method for creating adversarial examples. Instead of perturbing pixels, we use an encoder-decoder representation of the input image and perturb intermediate layers in the decoder. This changes the high-level features provided by the generative model. Therefore, our perturbation possesses semantic meaning, such as a longer beak or green tints. We formulate this task as an optimization problem by minimizing the Wasserstein distance between the adversarial and initial images under a misclassification constraint. We employ the projected gradient method with a simple inexact projection. Due to the projection, all iterations are feasible, and our method always generates adversarial images. We perform numerical experiments on the MNIST and ImageNet datasets in both targeted and untargeted settings. We demonstrate that our adversarial images are much less vulnerable to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Generative Adversarial Networks and Image Synthesis
