Tracing the Origin of Adversarial Attack for Forensic Investigation and Deterrence
Han Fang, Jiyi Zhang, Yupeng Qiu, Ke Xu, Chengfang Fang, Ee-Chien, Chang

TL;DR
This paper introduces a two-stage framework to trace the origin of adversarial attacks in neural networks, aiding forensic investigation and deterrence by embedding unique, traceable features into model copies.
Contribution
The paper proposes a novel separate-and-trace framework with a noise-sensitive training loss to identify the source model of adversarial examples.
Findings
Effective tracing of adversarial example origins across architectures
Framework applicable to various datasets and models
Enhanced forensic investigation capabilities
Abstract
Deep neural networks are vulnerable to adversarial attacks. In this paper, we take the role of investigators who want to trace the attack and identify the source, that is, the particular model which the adversarial examples are generated from. Techniques derived would aid forensic investigation of attack incidents and serve as deterrence to potential attacks. We consider the buyers-seller setting where a machine learning model is to be distributed to various buyers and each buyer receives a slightly different copy with same functionality. A malicious buyer generates adversarial examples from a particular copy and uses them to attack other copies. From these adversarial examples, the investigator wants to identify the source . To address this problem, we propose a two-stage separate-and-trace framework. The model separation stage generates multiple copies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Forensic and Genetic Research
