When Speculation Spills Secrets: Side Channels via Speculative Decoding In LLMs
Jiankun Wei, Abdulrahman Abdulrazzag, Tianchen Zhang, Adel Muursepp, Gururaj Saileshwar

TL;DR
This paper uncovers a side-channel attack on large language models using speculative decoding, enabling fingerprinting of user queries and leaking confidential data, and proposes mitigations like packet padding and token aggregation.
Contribution
It reveals a novel side-channel attack on LLMs exploiting speculative decoding patterns and evaluates effective defenses against this vulnerability.
Findings
Adversaries can fingerprint user prompts with over 75% accuracy.
Leakage of confidential datastore contents at over 25 tokens/sec.
Mitigations like packet padding reduce information leakage.
Abstract
Deployed large language models (LLMs) often rely on speculative decoding, a technique that generates and verifies multiple candidate tokens in parallel, to improve throughput and latency. In this work, we reveal a new side-channel whereby input-dependent patterns of correct and incorrect speculations can be inferred by monitoring per-iteration token counts or packet sizes. In evaluations using research prototypes and production-grade vLLM serving frameworks, we show that an adversary monitoring these patterns can fingerprint user queries (from a set of 50 prompts) with over 75% accuracy across four speculative-decoding schemes at temperature 0.3: REST (100%), LADE (91.6%), BiLD (95.2%), and EAGLE (77.6%). Even at temperature 1.0, accuracy remains far above the 2% random baseline - REST (99.6%), LADE (61.2%), BiLD (63.6%), and EAGLE (24%). We also show the capability of the attacker to…
Peer Reviews
Decision·Submitted to ICLR 2026
It is important to draw attention to the potential privacy risks of LLM inference techniques, given how widely LLMs and inference optimizations are used today. The paper runs simple, proof-of-concept experiments in controlled, simulated settings that demonstrate that speculative decoding can leak information about user prompts via packet sizes. The paper also proposes and evaluates defenses against the attack, giving concrete mitigations that LLM providers can implement. Code and documentation
* The high accuracies reported in the abstract are only achieved in limited settings: low temperature (0.3) and the exact set of 50 test prompts are known and used at training time. When the temperature increases, or when the exact test prompts are not trained on, the accuracies decrease significantly, although still above random guessing. It seems like much of what the attack is doing is memorizing the fingerprint for specific responses, as indicated by the brittleness to increasing temperature
1. This paper is the first to reveal a packet-size-based side-channel attack introduced by speculative decoding techniques in LLMs. It explicitly differentiates this work from prior LLM side-channel attacks, such as token-length leakage and timing attacks by focusing on input-dependent speculation patterns. 2. The attack is validated across four speculative decoding schemes (REST, LADE, BiLD, EAGLE) and tested in both academic prototypes and the production-grade vLLM serving framework, confi
1. There are still issues with writing and typesetting. For example, the caption of Figure 1; the font size in Figure 5 is excessively small; and in tables (e.g., Table 1, Table 4), the layout is overly compact. 2. Although Experiment 3 (semantically similar but non-identical queries) and Section 4.8 (out-of-distribution training) evaluate the fingerprinting attack under approximate or out-of-distribution dataset setups, both configurations remain somewhat idealized. 3. The paper provides no j
1. The paper is mostly well-written and easy to follow. 2. It is an interesting observation that per-iteration token count or packet size can leak private information. 3. The attack works across different speculative decoding schemes. 4. The defense mechanisms can effective reduce the risk of information leakage.
1. The experiments is limited to a very special medical chatbot scenario where there are only a small number of diseases. There is no experiment about scaling up the number of possible labels, or diseases in this case. 2. The proposed defense mechanisms are all very costly and not very practical.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Topic Modeling
