Hidden Tail: Adversarial Image Causing Stealthy Resource Consumption in Vision-Language Models

Rui Zhang; Zihan Wang; Tianli Yang; Hongwei Li; Wenbo Jiang; Qingchuan Zhao; Yang Liu; Guowen Xu

arXiv:2508.18805·cs.CR·August 27, 2025

Hidden Tail: Adversarial Image Causing Stealthy Resource Consumption in Vision-Language Models

Rui Zhang, Zihan Wang, Tianli Yang, Hongwei Li, Wenbo Jiang, Qingchuan Zhao, Yang Liu, Guowen Xu

PDF

TL;DR

This paper introduces Hidden Tail, a stealthy adversarial attack on vision-language models that induces maximum-length outputs without revealing abnormal content, highlighting vulnerabilities in model efficiency defenses.

Contribution

The paper proposes a novel prompt-agnostic adversarial image attack that maximizes output length while maintaining stealthiness, advancing resource consumption attack techniques.

Findings

01

Outperforms existing attacks by increasing output length up to 19.2×

02

Achieves maximum token limit while preserving stealthiness

03

Demonstrates the need for improved robustness against efficiency attacks

Abstract

Vision-Language Models (VLMs) are increasingly deployed in real-world applications, but their high inference cost makes them vulnerable to resource consumption attacks. Prior attacks attempt to extend VLM output sequences by optimizing adversarial images, thereby increasing inference costs. However, these extended outputs often introduce irrelevant abnormal content, compromising attack stealthiness. This trade-off between effectiveness and stealthiness poses a major limitation for existing attacks. To address this challenge, we propose \textit{Hidden Tail}, a stealthy resource consumption attack that crafts prompt-agnostic adversarial images, inducing VLMs to generate maximum-length outputs by appending special tokens invisible to users. Our method employs a composite loss function that balances semantic preservation, repetitive special token induction, and suppression of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.