Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding

Shuoyang Sun; Chang Dai; Hao Fang; Kuofeng Gao; Xinhao Zhong; Yi Sun; Fan Mo; Shu-Tao Xia; Bin Chen

arXiv:2605.14005·cs.CL·May 19, 2026

Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding

Shuoyang Sun, Chang Dai, Hao Fang, Kuofeng Gao, Xinhao Zhong, Yi Sun, Fan Mo, Shu-Tao Xia, Bin Chen

PDF

TL;DR

This paper introduces Mistletoe, a stealthy attack that exploits vulnerabilities in speculative decoding of large language models, significantly reducing acceleration efficiency while maintaining output quality.

Contribution

It reveals a new mechanism-level vulnerability in speculative decoding and proposes Mistletoe, an attack that degrades acceleration without affecting model outputs.

Findings

01

Mistletoe substantially reduces average accepted length τ.

02

It collapses speedup and lowers token throughput.

03

Output quality and perplexity are preserved.

Abstract

Speculative decoding has become a widely adopted technique for accelerating large language model (LLM) inference by drafting multiple candidate tokens and verifying them with a target model in parallel. Its efficiency, however, critically depends on the average accepted length $τ$ , i.e., how many draft tokens survive each verification step. In this work, we identify a new mechanism-level vulnerability in model-based speculative decoding: the drafter is trained to approximate the target model distribution, but this approximation is inevitably imperfect. Such a drafter-target mismatch creates a hidden attack surface where small perturbations can preserve the target model's visible behavior while substantially reducing draft-token acceptability. We propose Mistletoe, a stealthy acceleration-collapse attack against speculative decoding. Mistletoe directly targets the acceptance mechanism…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.