Adversarial Manipulation of Deep Representations
Sara Sabour, Yanshuai Cao, Fartash Faghri, David J. Fleet

TL;DR
This paper demonstrates that deep neural network representations can be subtly manipulated to make one image internally resemble another, revealing vulnerabilities in how DNNs encode visual information.
Contribution
It introduces a new class of adversarial images that alter internal representations without changing perceptual appearance, differing from traditional adversarial attacks.
Findings
Adversarial images can mimic internal representations of different images.
Minor, imperceptible perturbations can cause significant internal representation shifts.
This challenges assumptions about the stability and interpretability of DNN representations.
Abstract
We show that the representation of an image in a deep neural network (DNN) can be manipulated to mimic those of other natural images, with only minor, imperceptible perturbations to the original image. Previous methods for generating adversarial images focused on image perturbations designed to produce erroneous class labels, while we concentrate on the internal layers of DNN representations. In this way our new class of adversarial images differs qualitatively from others. While the adversary is perceptually similar to one image, its internal representation appears remarkably similar to a different image, one from a different class, bearing little if any apparent similarity to the input; they appear generic and consistent with the space of natural images. This phenomenon raises questions about DNN representations, as well as the properties of natural images themselves.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis
