MagNet: a Two-Pronged Defense against Adversarial Examples
Dongyu Meng, Hao Chen

TL;DR
MagNet is a defense framework that detects and reconstructs adversarial examples without modifying the classifier, using learned manifolds to generalize against various attacks and improve robustness.
Contribution
MagNet introduces a novel approach with detector and reformer networks that learn the normal data manifold, enabling effective defense without attack-specific modifications.
Findings
Effective against state-of-the-art blackbox and graybox attacks
Maintains low false positive rate on normal examples
Provides robust generalization without attack-specific training
Abstract
Deep learning has shown promising results on hard perceptual problems in recent years. However, deep learning systems are found to be vulnerable to small adversarial perturbations that are nearly imperceptible to human. Such specially crafted perturbations cause deep learning systems to output incorrect decisions, with potentially disastrous consequences. These vulnerabilities hinder the deployment of deep learning systems where safety or security is important. Attempts to secure deep learning systems either target specific attacks or have been shown to be ineffective. In this paper, we propose MagNet, a framework for defending neural network classifiers against adversarial examples. MagNet does not modify the protected classifier or know the process for generating adversarial examples. MagNet includes one or more separate detector networks and a reformer network. Different from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Bacillus and Francisella bacterial research
