Reverse engineering adversarial attacks with fingerprints from   adversarial examples

David Aaron Nicholson; Vincent Emanuele

arXiv:2301.13869·cs.AI·February 2, 2023

Reverse engineering adversarial attacks with fingerprints from adversarial examples

David Aaron Nicholson, Vincent Emanuele

PDF

Open Access

TL;DR

This paper demonstrates that adversarial attack algorithms can be reverse engineered and classified using neural network fingerprints, achieving high accuracy even with estimated perturbations, advancing attribution methods in adversarial machine learning.

Contribution

It introduces a novel approach to classify adversarial attack algorithms from examples, including a fingerprinting method using signal processing, with high accuracy demonstrated on multiple attack types.

Findings

01

ResNet50 classifies attack perturbations with 99.4% accuracy.

02

JPEG algorithm serves as an effective fingerprint with 85.05% accuracy.

03

Fingerprinting can be performed without access to actual perturbations.

Abstract

In spite of intense research efforts, deep neural networks remain vulnerable to adversarial examples: an input that forces the network to confidently produce incorrect outputs. Adversarial examples are typically generated by an attack algorithm that optimizes a perturbation added to a benign input. Many such algorithms have been developed. If it were possible to reverse engineer attack algorithms from adversarial examples, this could deter bad actors because of the possibility of attribution. Here we formulate reverse engineering as a supervised learning problem where the goal is to assign an adversarial example to a class that represents the algorithm and parameters used. To our knowledge it has not been previously shown whether this is even possible. We first test whether we can classify the perturbations added to images by attacks on undefended single-label image classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsTest