High-Entropy Tokens as Multimodal Failure Points in Vision-Language Models
Mengqi He, Xinyu Tian, Xin Shen, Jinhong Ni, Shu Zou, Zhaoyuan Yang, Jing Zhang

TL;DR
This paper identifies that a small subset of high-entropy tokens in vision-language models are key points for successful adversarial attacks, enabling efficient and transferable perturbations that degrade model performance.
Contribution
The study reveals that targeting a small fraction of high-entropy tokens is highly effective for adversarial attacks, introducing a simple entropy-guided attack method with high success and transferability.
Findings
High-entropy tokens account for a disproportionate share of adversarial influence.
Attacks on these tokens achieve similar semantic degradation with fewer perturbations.
The proposed method achieves 93-95% success rates across diverse models.
Abstract
Vision-language models (VLMs) achieve remarkable performance but remain vulnerable to adversarial attacks. Entropy, as a measure of model uncertainty, is highly correlated with VLM reliability. While prior entropy-based attacks maximize uncertainty at all decoding steps, implicitly assuming that every token equally contributes to model instability, we reveal that a small fraction (around 20%) of high-entropy tokens, in the evaluated representative open-source VLMs with diverse architectures, concentrates a disproportionate share of adversarial influence during autoregressive generation. We demonstrate that concentrating adversarial perturbations on these high-entropy positions achieves comparable semantic degradation to global methods while optimizing fewer decoding positions. Additionally, across multiple representative VLMs, such attacks induce not only semantic drift but also a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
