Speculative Decoding Scaling Laws (SDSL): Throughput Optimization Made Simple

Amirhossein Bozorgkhoo; Igor Molybog

arXiv:2603.11053·cs.CL·March 13, 2026

Speculative Decoding Scaling Laws (SDSL): Throughput Optimization Made Simple

Amirhossein Bozorgkhoo, Igor Molybog

PDF

Open Access

TL;DR

This paper introduces a theoretical framework for optimizing the throughput of speculative decoding in language models, enabling prediction of optimal hyperparameters without costly training.

Contribution

It provides an analytical theory linking hyperparameters to inference throughput, simplifying the optimization process for speculative decoding systems.

Findings

01

The theory accurately predicts throughput-optimal hyperparameters.

02

It reduces the need for extensive experimental tuning.

03

Enhances inference efficiency for large language models.

Abstract

Speculative decoding is a technique that uses multiple language models to accelerate infer- ence. Previous works have used an experi- mental approach to optimize the throughput of the inference pipeline, which involves LLM training and can be costly. This study of spec- ulative decoding proposes a theory that ana- lytically connects the key hyperparameters of pre-trained LLMs to the throughput efficiency of a downstream SD-based inference system. The theory allows the prediction of throughput- optimal hyperparameters for the components of an inference system before their pre-training.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Machine Learning and Algorithms · Machine Learning and Data Classification