Adversarial Robustification via Text-to-Image Diffusion Models
Daewon Choi, Jongheon Jeong, Huiwon Jang, Jinwoo Shin

TL;DR
This paper introduces a scalable, data-free method leveraging text-to-image diffusion models to enhance adversarial robustness of neural classifiers, including CLIP, without requiring training data.
Contribution
It proposes a novel, model-agnostic approach using diffusion models as denoisers to improve adversarial robustness without data, outperforming prior data-dependent methods.
Findings
Improved provable adversarial robustness of CLIP classifiers.
Achieved robustness gains while maintaining classification accuracy.
Applicable to various visual classifiers beyond CLIP.
Abstract
Adversarial robustness has been conventionally believed as a challenging property to encode for neural networks, requiring plenty of training data. In the recent paradigm of adopting off-the-shelf models, however, access to their training data is often infeasible or not practical, while most of such models are not originally trained concerning adversarial robustness. In this paper, we develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data. Our intuition is to view recent text-to-image diffusion models as "adaptable" denoisers that can be optimized to specify target tasks. Based on this, we propose: (a) to initiate a denoise-and-classify pipeline that offers provable guarantees against adversarial attacks, and (b) to leverage a few synthetic reference images generated from the text-to-image model that enables novel adaptation schemes. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Adversarial Robustness in Machine Learning · Image Processing Techniques and Applications
MethodsDiffusion · Contrastive Language-Image Pre-training
