A Statistical Defense Approach for Detecting Adversarial Examples
Alessandro Cennamo, Ido Freeman, Anton Kummert

TL;DR
This paper proposes a statistical detection method for adversarial examples in deep neural networks, using classifier prediction signatures and class-specific statistics to reliably identify malicious inputs, outperforming existing methods.
Contribution
The paper introduces a novel statistical detection system that exploits training-set information and classifier signatures to detect adversarial examples more effectively than prior approaches.
Findings
Outperforms state-of-the-art detection methods
Reliable detection across various settings
Complementary to other defense strategies
Abstract
Adversarial examples are maliciously modified inputs created to fool deep neural networks (DNN). The discovery of such inputs presents a major issue to the expansion of DNN-based solutions. Many researchers have already contributed to the topic, providing both cutting edge-attack techniques and various defensive strategies. In this work, we focus on the development of a system capable of detecting adversarial samples by exploiting statistical information from the training-set. Our detector computes several distorted replicas of the test input, then collects the classifier's prediction vectors to build a meaningful signature for the detection task. Then, the signature is projected onto the class-specific statistic vector to infer the input's nature. The classification output of the original input is used to select the class-statistic vector. We show that our method reliably detects…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
