Guided Diffusion Model for Adversarial Purification
Jinyi Wang, Zhaoyang Lyu, Dahua Lin, Bo Dai, Hongfei Fu

TL;DR
This paper introduces GDMP, a diffusion-based purification method that effectively defends against adversarial attacks on image classifiers by removing perturbations through guided denoising, significantly enhancing robustness.
Contribution
The paper proposes a novel diffusion-based purification technique, GDMP, integrating adversarial defense into the diffusion process to improve classifier robustness against attacks.
Findings
GDMP reduces adversarial perturbations effectively.
GDMP improves robust accuracy by 5% on CIFAR10.
GDMP achieves 70.94% robustness on ImageNet.
Abstract
With wider application of deep neural networks (DNNs) in various algorithms and frameworks, security threats have become one of the concerns. Adversarial attacks disturb DNN-based image classifiers, in which attackers can intentionally add imperceptible adversarial perturbations on input images to fool the classifiers. In this paper, we propose a novel purification approach, referred to as guided diffusion model for purification (GDMP), to help protect classifiers from adversarial attacks. The core of our approach is to embed purification into the diffusion denoising process of a Denoised Diffusion Probabilistic Model (DDPM), so that its diffusion process could submerge the adversarial perturbations with gradually added Gaussian noises, and both of these noises can be simultaneously removed following a guided denoising process. On our comprehensive experiments across various datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsDiffusion
