SSSD: Simply-Scalable Speculative Decoding
Michele Marzollo, Jiawei Zhuang, Niklas Roemer, Niklas Zwingenberger, Lorenz K. M\"uller, Lukas Cavigelli

TL;DR
SSSD is a training-free speculative decoding method that significantly accelerates large language model inference with minimal complexity, matching the performance of more complex approaches and offering robustness across tasks and domains.
Contribution
Introduces SSSD, a novel training-free speculative decoding technique combining lightweight n-gram matching with hardware-aware speculation for scalable inference.
Findings
Reduces latency by up to 2.9x compared to standard decoding.
Achieves performance comparable to training-based methods.
Exhibits superior robustness under language and domain shifts.
Abstract
Speculative Decoding has emerged as a popular technique for accelerating inference in Large Language Models. However, most existing approaches yield only modest improvements in production serving systems. Methods that achieve substantial speedups typically rely on an additional trained draft model or auxiliary model components, increasing deployment and maintenance complexity. This added complexity reduces flexibility, particularly when serving workloads shift to tasks, domains, or languages that are not well represented in the draft model's training data. We introduce Simply-Scalable Speculative Decoding (SSSD), a training-free method that combines lightweight n-gram matching with hardware-aware speculation. Relative to standard autoregressive decoding, SSSD reduces latency by up to 2.9x. It achieves performance on par with leading training-based approaches across a broad range of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Error Correcting Code Techniques · DNA and Biological Computing
