Adversarial Guided Diffusion Models for Adversarial Purification
Guang Lin, Zerui Tao, Jianhai Zhang, Toshihisa Tanaka, Qibin Zhao

TL;DR
This paper introduces an adversarial guided diffusion model that enhances adversarial purification by preserving semantic information and removing perturbations, significantly improving robustness across multiple datasets.
Contribution
The paper proposes a novel adversarial guidance method using an auxiliary neural network trained adversarially, which improves diffusion model-based adversarial purification without relying on pixel-level perturbation measures.
Findings
Improves robust accuracy by up to 7.30% on CIFAR-10.
Effectively preserves semantic information while removing adversarial perturbations.
Enhances robustness of diffusion model-based adversarial purification methods.
Abstract
Diffusion model (DM) based adversarial purification (AP) has proven to be a powerful defense method that can remove adversarial perturbations and generate a purified example without threats. In principle, the pre-trained DMs can only ensure that purified examples conform to the same distribution of the training data, but it may inadvertently compromise the semantic information of input examples, leading to misclassification of purified examples. Recent advancements introduce guided diffusion techniques to preserve semantic information while removing the perturbations. However, these guidances often rely on distance measures between purified examples and diffused examples, which can also preserve perturbations in purified examples. To further unleash the robustness power of DM-based AP, we propose an adversarial guided diffusion model (AGDM) by introducing a novel adversarial guidance…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. The motivation of the paper is very clear, which effectively delivers the main insight of this paper. 2. The proposed method is intuitive and easy to understand. 3. The guided sampling can be extended to continuous-time DMs, which means AGDM can be generalized to different DMs.
1. The proposed method may have very low efficiency. DM-based AP method is very slow during the inference stage (as it is a completely 'inference-time defense'). Based on this, this paper proposes to train an auxiliary neural network using AT, which will further increase the computational complexity of both the training (as AT is very slow by its nature) and the inference (as adversarial guidance is introduced to the reverse process of DMs). 2. A followed-up weakness is: this paper did not rep
1. The proposed method seems to be reasonable. Combining the robust classifier with diffusion models has the potential to improve the robustness. 2. The experiments are relatively comprehensive, with several datasets and several attack methods included.
1. The tricky illustrations. The diffusion step t is set to be 70 in the experiments, while in Fig 1, the step is set to 400. I recommend using the actual step for illustration to help readers comprehensively understand this work. Furthermore, How to create Figure 2 is not clear and there is no experimental support for Figure 2. 2. There is no theoretical analysis to show the reason why this process is better than other classifier-guided diffusion purification methods. 3. Lack of innovation and
- The limitations of existing related methods are thoroughly discussed, and the proposed AGDM is well-motivated. - The superiority of AGDM to existing diffusion-based AP methods indicates the significance of guidance for the diffusion model in AP.
- The notations and interpretations in Section 3.2 can be confusing. Specifically, the interpretation of $p_{\phi}(x' \mid x_t)$ (Lines 212-213) only concerns the semantic information of $x'$, but the notation itself seems to indicate that the specific pixel values of $x'$ are also concerned. If only the semantic information is considered, it should be something like $p_{\phi}(s(x') \mid x_t)$. - In Lines 277-278, it is stated that the auxiliary network is not required to be a robust classifier
1. The overall method is technically sound. AGDM proposes to train an auxiliary classifier via adversarial training, which is sound. Adversarial training could help the classifier recognize the adversarial sample, which could naturally generate the most robust latent features to defend against the adversarial perturbation. Leveraging this, the overall conditional generation process will be more robust against the adversarial perturbation and alleviate the trade-off. 2. Introducing adversarial t
1. The contributions of this paper are limited. One of the main contributions of this paper is how to calculate the adversarial guidance, which is well explored under the training-free conditional diffusion such as FreeDom. In the view of the FreeDom, it is just a multi-conditional guidance, which could be easy to calculate. Specifically, the adversarial guidance in Sec. Methods contains two parts: 1) The MSE in the latent space between the intermediate results $x_{t}$ and the adversarial samp
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsDiffusion
