Reverse engineering adversarial attacks with fingerprints from adversarial examples
David Aaron Nicholson, Vincent Emanuele

TL;DR
This paper demonstrates that adversarial attack algorithms can be reverse engineered and classified using neural network fingerprints, achieving high accuracy even with estimated perturbations, advancing attribution methods in adversarial machine learning.
Contribution
It introduces a novel approach to classify adversarial attack algorithms from examples, including a fingerprinting method using signal processing, with high accuracy demonstrated on multiple attack types.
Findings
ResNet50 classifies attack perturbations with 99.4% accuracy.
JPEG algorithm serves as an effective fingerprint with 85.05% accuracy.
Fingerprinting can be performed without access to actual perturbations.
Abstract
In spite of intense research efforts, deep neural networks remain vulnerable to adversarial examples: an input that forces the network to confidently produce incorrect outputs. Adversarial examples are typically generated by an attack algorithm that optimizes a perturbation added to a benign input. Many such algorithms have been developed. If it were possible to reverse engineer attack algorithms from adversarial examples, this could deter bad actors because of the possibility of attribution. Here we formulate reverse engineering as a supervised learning problem where the goal is to assign an adversarial example to a class that represents the algorithm and parameters used. To our knowledge it has not been previously shown whether this is even possible. We first test whether we can classify the perturbations added to images by attacks on undefended single-label image classification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsTest
