Spectral Signatures in Backdoor Attacks
Brandon Tran, Jerry Li, Aleksander Madry

TL;DR
This paper identifies spectral signatures as a common property of backdoor attacks, enabling detection and removal of poisoned data in neural networks to improve security.
Contribution
It introduces the concept of spectral signatures in backdoor attacks and demonstrates their use in detecting and mitigating such attacks in real-world neural networks.
Findings
Spectral signatures are present in all known backdoor attacks.
Tools from robust statistics can effectively detect poisoned data.
Detection and removal improve neural network security against backdoors.
Abstract
A recent line of work has uncovered a new form of data poisoning: so-called \emph{backdoor} attacks. These attacks are particularly dangerous because they do not affect a network's behavior on typical, benign data. Rather, the network only deviates from its expected output when triggered by a perturbation planted by an adversary. In this paper, we identify a new property of all known backdoor attacks, which we call \emph{spectral signatures}. This property allows us to utilize tools from robust statistics to thwart the attacks. We demonstrate the efficacy of these signatures in detecting and removing poisoned examples on real image sets and state of the art neural network architectures. We believe that understanding spectral signatures is a crucial first step towards designing ML systems secure against such backdoor attacks
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Cryptographic Implementations and Security
