DiffuseDef: Improved Robustness to Adversarial Attacks via Iterative Denoising

Zhenhao Li; Huichi Zhou; Marek Rei; Lucia Specia

arXiv:2407.00248·cs.CL·May 20, 2025

DiffuseDef: Improved Robustness to Adversarial Attacks via Iterative Denoising

Zhenhao Li, Huichi Zhou, Marek Rei, Lucia Specia

PDF

Open Access 1 Repo 1 Video

TL;DR

DiffuseDef introduces an iterative denoising diffusion layer to enhance the robustness of language models against adversarial attacks, achieving state-of-the-art defense performance through a plug-and-play approach.

Contribution

The paper presents a novel diffusion-based adversarial defense method for language models, integrating denoising and ensembling to improve robustness against attacks.

Findings

01

Outperforms existing adversarial defense methods.

02

Achieves state-of-the-art results against black-box and white-box attacks.

03

Seamless plug-and-play integration with existing models.

Abstract

Pretrained language models have significantly advanced performance across various natural language processing tasks. However, adversarial attacks continue to pose a critical challenge to systems built using these models, as they can be exploited with carefully crafted adversarial texts. Inspired by the ability of diffusion models to predict and reduce noise in computer vision, we propose a novel and flexible adversarial defense method for language classification tasks, DiffuseDef, which incorporates a diffusion layer as a denoiser between the encoder and the classifier. The diffusion layer is trained on top of the existing classifier, ensuring seamless integration with any model in a plug-and-play manner. During inference, the adversarial hidden state is first combined with sampled noise, then denoised iteratively and finally ensembled to produce a robust text representation. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nickeilf/diffusedef
pytorchOfficial

Videos

DiffuseDef: Improved Robustness to Adversarial Attacks via Iterative Denoising· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsDiffusion