ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers

Hanwen Cao; Haobo Lu; Xiaosen Wang; Kun He

arXiv:2508.12384·cs.CV·August 19, 2025

ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers

Hanwen Cao, Haobo Lu, Xiaosen Wang, Kun He

PDF

TL;DR

This paper introduces ViT-EnsembleAttack, a novel ensemble-based adversarial attack method tailored for Vision Transformers, using adversarial augmentation techniques to significantly improve transferability of attacks.

Contribution

It proposes the first ensemble attack specifically designed for ViTs, employing adversarial augmentation strategies and optimization modules to enhance attack transferability.

Findings

01

Outperforms existing ensemble attack methods on ViTs

02

Uses adversarial augmentation to improve transferability

03

Achieves significant attack success rate improvements

Abstract

Ensemble-based attacks have been proven to be effective in enhancing adversarial transferability by aggregating the outputs of models with various architectures. However, existing research primarily focuses on refining ensemble weights or optimizing the ensemble path, overlooking the exploration of ensemble models to enhance the transferability of adversarial attacks. To address this gap, we propose applying adversarial augmentation to the surrogate models, aiming to boost overall generalization of ensemble models and reduce the risk of adversarial overfitting. Meanwhile, observing that ensemble Vision Transformers (ViTs) gain less attention, we propose ViT-EnsembleAttack based on the idea of model adversarial augmentation, the first ensemble-based attack method tailored for ViTs to the best of our knowledge. Our approach generates augmented models for each surrogate ViT using three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.