NatADiff: Adversarial Boundary Guidance for Natural Adversarial Diffusion

Max Collins; Jordan Vice; Tim French; Ajmal Mian

arXiv:2505.20934·cs.LG·March 4, 2026

NatADiff: Adversarial Boundary Guidance for Natural Adversarial Diffusion

Max Collins, Jordan Vice, Tim French, Ajmal Mian

PDF

Open Access 3 Reviews

TL;DR

NatADiff introduces a novel diffusion-based method for generating natural adversarial samples that better mimic real-world errors, improving transferability and fidelity compared to existing techniques.

Contribution

The paper presents NatADiff, a diffusion-guided adversarial sampling scheme that enhances attack transferability and realism of adversarial examples in deep learning.

Findings

01

Achieves comparable attack success rates to state-of-the-art methods.

02

Produces adversarial samples with higher transferability across models.

03

Generates samples that better resemble natural test-time errors as measured by FID.

Abstract

Adversarial samples exploit irregularities in the manifold `learned' by deep learning models to cause misclassifications. The study of these adversarial samples provides insight into the features a model uses to classify inputs, which can be leveraged to improve robustness against future attacks. However, much of the existing literature focuses on constrained adversarial samples, which do not accurately reflect test-time errors encountered in real-world settings. To address this, we propose `NatADiff', an adversarial sampling scheme that leverages denoising diffusion to generate natural adversarial samples. Our approach is based on the observation that natural adversarial samples frequently contain structural elements from the adversarial class. Deep learning models can exploit these structural elements to shortcut the classification process, rather than learning to genuinely…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

(S1) Vastly Superior Transferability The paper's primary strength lies in its demonstration of vastly superior transferability. As shown in Table 1, the proposed NatADiff significantly outperforms all competitors, including SOTA diffusion-based attacks like AdvClass and DiffAttack, in average transfer ASR. The ability to successfully attack a ViT-H model with samples generated from a CNN (RN-50) at such a high success rate strongly suggests the method identifies fundamental, architecture-agnosti

Weaknesses

(W1) Inconsistent motivation for Similarity Targeting: The authors motivate the use of similarity targeting (U) by stating that it “outperform[s] targeted attacks (T)”. While this holds true for CNN surrogates (RN-50, Inc-v3), the paper fails to acknowledge or analyze the contradictory result from the ViT-H surrogate, where the random targeted attack (T) significantly outperforms the similarity-based untargeted attack (U) in average ASR (73.2% vs 69.7%). This omission weakens the claim that simi

Reviewer 02Rating 8Confidence 3

Strengths

- Novel integration of adversarial boundary guidance and classifier augmentations in diffusion models. - Strong empirical results: high transferability and competitive white-box performance. - Well-motivated by the link between contextual cues and natural adversarial samples. - Comprehensive evaluation across multiple architectures and adversarial defenses.

Weaknesses

- Computationally expensive due to iterative diffusion sampling. - Limited to ImageNet; evaluation on more specialized domains is future work. - Similarity targeting may lead to subtle misclassifications (e.g., between similar classes).

Reviewer 03Rating 6Confidence 3

Strengths

1. This paper connects the concept of adversarial bound guidance with the generation of natural adversarial examples, formalizing the intuition that natural errors occur due to overreliance on contextual cues. This is a well-thought-out approach. 2. The method demonstrates excellent technical depth and implementation. It combines time-travel sampling, classifier-free guidance, gradient normalization, and boosting; ablation experiments are conducted for each component to evaluate its contribution

Weaknesses

1. The manuscript provides no ablation study or discussion on the robustness of the adversarial boundary guidance to variations in the textual implementation of the intersection prompt `y ∩ ỹ`. The stability of results across different prompt engineering strategies (e.g., varying templates or phrasing) remains entirely unexplored. 2. The appendix mentions that the adversarial guidance strength was "manually tuned" and notes that s behaves close to binarization (the attack succeeds only after re

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsDiffusion