Generating Out of Distribution Adversarial Attack using Latent Space Poisoning
Ujjwal Upadhyay, Prerana Mukherjee

TL;DR
This paper introduces a novel adversarial attack method that manipulates the latent space of images using a disentangled variational autoencoder, generating out-of-distribution examples that fool robust classifiers without altering the original images.
Contribution
It presents a new latent space poisoning technique that creates out-of-distribution adversarial examples by tampering with the latent representations, bypassing gradient-based defenses.
Findings
Successfully fools classifiers on MNIST, SVHN, CelebA datasets.
Outperforms traditional gradient-based attacks in robustness tests.
Generates perceptually similar but misclassified images.
Abstract
Traditional adversarial attacks rely upon the perturbations generated by gradients from the network which are generally safeguarded by gradient guided search to provide an adversarial counterpart to the network. In this paper, we propose a novel mechanism of generating adversarial examples where the actual image is not corrupted rather its latent space representation is utilized to tamper with the inherent structure of the image while maintaining the perceptual quality intact and to act as legitimate data samples. As opposed to gradient-based attacks, the latent space poisoning exploits the inclination of classifiers to model the independent and identical distribution of the training dataset and tricks it by producing out of distribution samples. We train a disentangled variational autoencoder (beta-VAE) to model the data in latent space and then we add noise perturbations using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSolana Customer Service Number +1-833-534-1729
