Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding
Shuoyang Sun, Chang Dai, Hao Fang, Kuofeng Gao, Xinhao Zhong, Yi Sun, Fan Mo, Shu-Tao Xia, Bin Chen

TL;DR
This paper introduces Mistletoe, a stealthy attack that exploits vulnerabilities in speculative decoding of large language models, significantly reducing acceleration efficiency while maintaining output quality.
Contribution
It reveals a new mechanism-level vulnerability in speculative decoding and proposes Mistletoe, an attack that degrades acceleration without affecting model outputs.
Findings
Mistletoe substantially reduces average accepted length τ.
It collapses speedup and lowers token throughput.
Output quality and perplexity are preserved.
Abstract
Speculative decoding has become a widely adopted technique for accelerating large language model (LLM) inference by drafting multiple candidate tokens and verifying them with a target model in parallel. Its efficiency, however, critically depends on the average accepted length , i.e., how many draft tokens survive each verification step. In this work, we identify a new mechanism-level vulnerability in model-based speculative decoding: the drafter is trained to approximate the target model distribution, but this approximation is inevitably imperfect. Such a drafter-target mismatch creates a hidden attack surface where small perturbations can preserve the target model's visible behavior while substantially reducing draft-token acceptability. We propose Mistletoe, a stealthy acceleration-collapse attack against speculative decoding. Mistletoe directly targets the acceptance mechanism…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
