Test-time Adversarial Defense with Opposite Adversarial Path and High Attack Time Cost

Cheng-Han Yeh; Kuanchun Yu; Chun-Shien Lu

arXiv:2410.16805·cs.LG·May 20, 2025

Test-time Adversarial Defense with Opposite Adversarial Path and High Attack Time Cost

Cheng-Han Yeh, Kuanchun Yu, Chun-Shien Lu

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a novel test-time adversarial defense using diffusion-based purification along opposite adversarial paths, effectively resisting attacks while analyzing the trade-off between attack effectiveness and computational cost.

Contribution

Proposes a diffusion-based purification method that leverages opposite adversarial paths for improved test-time defense, addressing limitations of prior training-time defenses.

Findings

01

Effective adversarial resistance demonstrated

02

Time complexity analysis shows trade-offs

03

Purifier can be integrated into pre-trained models

Abstract

Deep learning models are known to be vulnerable to adversarial attacks by injecting sophisticated designed perturbations to input data. Training-time defenses still exhibit a significant performance gap between natural accuracy and robust accuracy. In this paper, we investigate a new test-time adversarial defense method via diffusion-based recovery along opposite adversarial paths (OAPs). We present a purifier that can be plugged into a pre-trained model to resist adversarial attacks. Different from prior arts, the key idea is excessive denoising or purification by integrating the opposite adversarial direction with reverse diffusion to push the input image further toward the opposite adversarial direction. For the first time, we also exemplify the pitfall of conducting AutoAttack (Rand) for diffusion-based defense methods. Through the lens of time complexity, we examine the trade-off…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 3

Strengths

1. The proposed plug-and-play purifier can be integrated into pre-trained models, improving their robustness against adversarial attacks. 2. The method increases the time required for attackers to generate adaptive adversarial examples. 3. The paper also critiques the use of AutoAttack (Rand) for evaluating diffusion-based defenses and highlights the trade-off between attack effectiveness and computational complexity in adversarial robustness evaluations.

Weaknesses

1. The author's explanation of his method is not clear enough. There is no overall algorithm description (e.g. for section 3.3), making the reader difficult to follow. 2. The explanation in Table 2 is not persuasive. Why with the increase of K, the robustness decreases? It seems that this is contrary to the assumption in Figure 1. Could you provide a more detailed explanation of this apparent contradiction and discuss potential reasons for the decrease in robustness as K increases? 3. Th

Reviewer 02Rating 5Confidence 4

Strengths

- The idea of reducing the effects of adversarial perturbations by perturbing the input in a way that minimizes training loss makes sense. This process pushes the adversarial input to non-adversarial area. This idea cannot be applied directly because because the ground truth label is not available during inference. The proposed method is able to emulate this process through an inverse diffusion process and achieve efficient denoising. - The effectiveness of the proposed method is empirically dem

Weaknesses

- To demonstrate that the proposed method is effective against various attacks, it would be better to experiment with methods that converge to adversarial examples that differ from PGD and APGD, such as ACG and PGD-ODI. This is because PGD-based attacks highly depend on the initial point of the attack. The effectiveness of random initialization, such as Output Diversified Sampling (ODS) in adversarial attacks, implies the high dependency of PGD-based attacks on initial points. It is natural to a

Reviewer 03Rating 5Confidence 4

Strengths

1. This paper introduces a new method to alleviate the influence of adversarial noise, which proposes to leverage the opposite adversarial path and reverse diffusion. 2. The experiments are performed on three popular datasets, which provides extensive evaluation results. 3. Multiple related works are considered for comparison to present the gains brought by the proposed method.

Weaknesses

1. The statement in Section 3.1.2 is contradictory to the statement in Section 1.3, and the corresponding explanation is not convincing enough. 2. In this work, the proposed method only consider the manner of non-targeted adversarial attacks. For the targeted attacks, the adversarial gradient is varied with the specified target class, and the proposed method with a single opposite adversarial path seems not to be able to handle this case. 3. The adversarial attacks used to evaluate the defense

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Integrated Circuits and Semiconductor Failure Analysis · Physical Unclonable Functions (PUFs) and Hardware Security

MethodsDiffusion