X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
Hanxun Huang, Sarah Erfani, Yige Li, Xingjun Ma, James Bailey

TL;DR
This paper introduces X-Transfer, a novel attack method that creates a universal adversarial perturbation capable of deceiving various CLIP models and vision-language systems across multiple domains and tasks, revealing a super transferability vulnerability.
Contribution
X-Transfer presents a new scalable surrogate scaling technique to generate super transferable universal adversarial perturbations for CLIP models, outperforming previous methods.
Findings
X-Transfer achieves superior transferability across models and tasks.
The method significantly outperforms existing UAP techniques.
It establishes a new benchmark for adversarial attacks on CLIP.
Abstract
As Contrastive Language-Image Pre-training (CLIP) models are increasingly adopted for diverse downstream tasks and integrated into large vision-language models (VLMs), their susceptibility to adversarial perturbations has emerged as a critical concern. In this work, we introduce \textbf{X-Transfer}, a novel attack method that exposes a universal adversarial vulnerability in CLIP. X-Transfer generates a Universal Adversarial Perturbation (UAP) capable of deceiving various CLIP encoders and downstream VLMs across different samples, tasks, and domains. We refer to this property as \textbf{super transferability}--a single perturbation achieving cross-data, cross-domain, cross-model, and cross-task adversarial transferability simultaneously. This is achieved through \textbf{surrogate scaling}, a key innovation of our approach. Unlike existing methods that rely on fixed surrogate models,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗hanxunh/cpgc_clip_rn101_flicker30kmodel· 7 dl7 dl
- 🤗hanxunh/cpgc_clip_rn101_mscocomodel· 7 dl7 dl
- 🤗hanxunh/cpgc_clip_vit_b16_flicker30kmodel· 33 dl33 dl
- 🤗hanxunh/cpgc_clip_vit_b16_mscocomodel· 12 dl12 dl
- 🤗hanxunh/etu_clip_rn50_flickr30_uapmodel· 6 dl6 dl
- 🤗hanxunh/etu_clip_vit_b16_flickr30_uapmodel· 16 dl16 dl
- 🤗hanxunh/gd_uap_dl_resnet_msc_with_all_datamodel· 17 dl17 dl
- 🤗hanxunh/gd_uap_resnet_with_datamodel· 7 dl7 dl
- 🤗hanxunh/metauap_normalized_logits_ensemble_coco_metamodel· 20 dl20 dl
- 🤗hanxunh/metauap_normalized_logits_ensemble_cocomodel· 4 dl4 dl
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Domain Adaptation and Few-Shot Learning
MethodsContrastive Language-Image Pre-training
