Boosting Generative Adversarial Transferability with Self-supervised Vision Transformer Features

Shangbo Wu; Yu-an Tan; Ruinan Ma; Wencong Ma; Dehua Zhu; Yuanzhang Li

arXiv:2506.21046·cs.CV·October 31, 2025

Boosting Generative Adversarial Transferability with Self-supervised Vision Transformer Features

Shangbo Wu, Yu-an Tan, Ruinan Ma, Wencong Ma, Dehua Zhu, Yuanzhang Li

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces dSVA, a novel attack method leveraging self-supervised Vision Transformer features from contrastive learning and masked image modeling to significantly improve the transferability of adversarial examples across different models.

Contribution

It proposes a dual self-supervised ViT feature-based generative attack framework that enhances black-box adversarial transferability by exploiting both global and local features.

Findings

01

dSVA outperforms state-of-the-art methods in transferability

02

Exploiting both CL and MIM features boosts attack success rates

03

Self-supervised ViT features improve adversarial generalizability

Abstract

The ability of deep neural networks (DNNs) come from extracting and interpreting features from the data provided. By exploiting intermediate features in DNNs instead of relying on hard labels, we craft adversarial perturbation that generalize more effectively, boosting black-box transferability. These features ubiquitously come from supervised learning in previous work. Inspired by the exceptional synergy between self-supervised learning and the Transformer architecture, this paper explores whether exploiting self-supervised Vision Transformer (ViT) representations can improve adversarial transferability. We present dSVA -- a generative dual self-supervised ViT features attack, that exploits both global structural features from contrastive learning (CL) and local textural features from masked image modeling (MIM), the self-supervised learning paradigm duo for ViTs. We design a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

spencerwooo/dsva
noneOfficial

Models

🤗
NexusBohanLiu/dSVA
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection

MethodsDropout · Absolute Position Encodings · Byte Pair Encoding · Softmax · Contrastive Learning · Mutual Information Machine/Mask Image Modeling · Label Smoothing · Transformer · Dense Connections · Layer Normalization