On the Adversarial Robustness of Large Vision-Language Models under Visual Token Compression

Xinwei Zhang; Hangcheng Liu; Li Bai; Hao Wang; Qingqing Ye; Tianwei Zhang; Haibo Hu

arXiv:2601.21531·cs.CR·May 19, 2026

On the Adversarial Robustness of Large Vision-Language Models under Visual Token Compression

Xinwei Zhang, Hangcheng Liu, Li Bai, Hao Wang, Qingqing Ye, Tianwei Zhang, Haibo Hu

PDF

TL;DR

This paper investigates the adversarial robustness of large vision-language models under visual token compression, revealing vulnerabilities overlooked by previous attacks and proposing a new method to better evaluate robustness.

Contribution

The paper introduces CAGE, a novel attack method that aligns perturbation optimization with token compression inference, exposing robustness weaknesses in compressed LVLMs.

Findings

01

CAGE achieves lower robust accuracy than baseline attacks across various compression methods.

02

Existing encoder-based attacks do not fully reveal vulnerabilities due to optimization-inference mismatch.

03

Robustness assessments ignoring compression may be overly optimistic.

Abstract

Visual token compression is widely used to accelerate large vision-language models (LVLMs) by pruning or merging visual tokens, yet its adversarial robustness remains unexplored. We show that existing encoder-based attacks cannot fully disclose the robustness vulnerabilities of compressed LVLMs, due to an optimization-inference mismatch: perturbations are optimized on the full-token representation, while inference is performed through a token-compression bottleneck. To address this gap, we propose the Compression-AliGnEd attack (CAGE), which aligns perturbation optimization with compression inference without assuming access to the deployed compression mechanism or its token budget. CAGE combines (i) expected feature disruption, which concentrates distortion on tokens likely to survive across plausible budgets, and (ii) rank distortion alignment, which actively aligns token distortions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection