Less Is More -- Until It Breaks: Security Pitfalls of Vision Token Compression in Large Vision-Language Models
Xiaomei Zhang, Zhaoxi Zhang, Leo Yu Zhang, Yanjun Zhang, Guanhong Tao, Shirui Pan

TL;DR
This paper uncovers that visual token compression in large vision-language models, while improving efficiency, significantly reduces robustness and introduces security vulnerabilities, especially under small perturbations, revealing a critical efficiency-security trade-off.
Contribution
The study reveals the security risks of visual token compression in LVLMs, identifies instability in token importance ranking as the cause, and proposes a targeted attack method to exploit this vulnerability.
Findings
Token compression degrades model robustness.
Small perturbations can cause significant token ranking changes.
Compression introduces hidden security vulnerabilities.
Abstract
Visual token compression is widely adopted to improve the inference efficiency of Large Vision-Language Models (LVLMs), enabling their deployment in latency-sensitive and resource-constrained scenarios. However, existing work has mainly focused on efficiency and performance, while the security implications of visual token compression remain largely unexplored. In this work, we first reveal that visual token compression substantially degrades the robustness of LVLMs: models that are robust under uncompressed inference become highly vulnerable once compression is enabled. These vulnerabilities are state-specific; failure modes emerge only in the compressed setting and completely disappear when compression is disabled, making them particularly hidden and difficult to diagnose. By analyzing the key stages of the compression process, we identify instability in token importance ranking as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
