Fast Collaborative Inference via Distributed Speculative Decoding
Ce Zheng, Ke Zhang, Chen Sun, Wenqi Zhang, Qiong Liu, Angesom Ataklity Tesfay

TL;DR
This paper introduces TSLT, a sparsify-then-sample strategy that reduces uplink communication in distributed speculative decoding for large language models, maintaining performance while improving efficiency.
Contribution
The paper proposes TSLT, a novel method that transmits only truncated logits and indices, significantly reducing communication overhead in distributed LLM inference with theoretical guarantees.
Findings
TSLT reduces uplink communication by up to X%
Maintains inference latency and model quality
Effective in AI-RAN distributed inference scenarios
Abstract
Speculative decoding accelerates large language model (LLM) inference by allowing a small draft model to predict multiple future tokens for verification by a larger target model. In AI-native radio access networks (AI-RAN), this enables device-edge collaborative inference but introduces significant uplink overhead, as existing distributed speculative decoding schemes transmit full vocabulary logits at every step. We propose a sparsify-then-sample strategy, Truncated Sparse Logits Transmission (TSLT), which transmits only the logits and indices of a truncated candidate set. We provide theoretical guarantees showing that the acceptance rate is preserved under TSLT. TSLT is further extended to multi-candidate case, where multiple draft candidates per step increase acceptance probability. Experiments show that TSLT significantly reduces uplink communication while maintaining end-to-end…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWireless Signal Modulation Classification · Privacy-Preserving Technologies in Data · Advanced Data and IoT Technologies
