NatADiff: Adversarial Boundary Guidance for Natural Adversarial Diffusion
Max Collins, Jordan Vice, Tim French, Ajmal Mian

TL;DR
NatADiff introduces a novel diffusion-based method for generating natural adversarial samples that better mimic real-world errors, improving transferability and fidelity compared to existing techniques.
Contribution
The paper presents NatADiff, a diffusion-guided adversarial sampling scheme that enhances attack transferability and realism of adversarial examples in deep learning.
Findings
Achieves comparable attack success rates to state-of-the-art methods.
Produces adversarial samples with higher transferability across models.
Generates samples that better resemble natural test-time errors as measured by FID.
Abstract
Adversarial samples exploit irregularities in the manifold `learned' by deep learning models to cause misclassifications. The study of these adversarial samples provides insight into the features a model uses to classify inputs, which can be leveraged to improve robustness against future attacks. However, much of the existing literature focuses on constrained adversarial samples, which do not accurately reflect test-time errors encountered in real-world settings. To address this, we propose `NatADiff', an adversarial sampling scheme that leverages denoising diffusion to generate natural adversarial samples. Our approach is based on the observation that natural adversarial samples frequently contain structural elements from the adversarial class. Deep learning models can exploit these structural elements to shortcut the classification process, rather than learning to genuinely…
Peer Reviews
Decision·ICLR 2026 Poster
(S1) Vastly Superior Transferability The paper's primary strength lies in its demonstration of vastly superior transferability. As shown in Table 1, the proposed NatADiff significantly outperforms all competitors, including SOTA diffusion-based attacks like AdvClass and DiffAttack, in average transfer ASR. The ability to successfully attack a ViT-H model with samples generated from a CNN (RN-50) at such a high success rate strongly suggests the method identifies fundamental, architecture-agnosti
(W1) Inconsistent motivation for Similarity Targeting: The authors motivate the use of similarity targeting (U) by stating that it “outperform[s] targeted attacks (T)”. While this holds true for CNN surrogates (RN-50, Inc-v3), the paper fails to acknowledge or analyze the contradictory result from the ViT-H surrogate, where the random targeted attack (T) significantly outperforms the similarity-based untargeted attack (U) in average ASR (73.2% vs 69.7%). This omission weakens the claim that simi
- Novel integration of adversarial boundary guidance and classifier augmentations in diffusion models. - Strong empirical results: high transferability and competitive white-box performance. - Well-motivated by the link between contextual cues and natural adversarial samples. - Comprehensive evaluation across multiple architectures and adversarial defenses.
- Computationally expensive due to iterative diffusion sampling. - Limited to ImageNet; evaluation on more specialized domains is future work. - Similarity targeting may lead to subtle misclassifications (e.g., between similar classes).
1. This paper connects the concept of adversarial bound guidance with the generation of natural adversarial examples, formalizing the intuition that natural errors occur due to overreliance on contextual cues. This is a well-thought-out approach. 2. The method demonstrates excellent technical depth and implementation. It combines time-travel sampling, classifier-free guidance, gradient normalization, and boosting; ablation experiments are conducted for each component to evaluate its contribution
1. The manuscript provides no ablation study or discussion on the robustness of the adversarial boundary guidance to variations in the textual implementation of the intersection prompt `y ∩ ỹ`. The stability of results across different prompt engineering strategies (e.g., varying templates or phrasing) remains entirely unexplored. 2. The appendix mentions that the adversarial guidance strength was "manually tuned" and notes that s behaves close to binarization (the attack succeeds only after re
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsDiffusion
