TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding

Zhaoxuan Wu; Zijian Zhou; Arun Verma; Alok Prakash; Daniela Rus; Bryan Kian Hsiang Low

arXiv:2502.15197·cs.CL·June 2, 2025

TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding

Zhaoxuan Wu, Zijian Zhou, Arun Verma, Alok Prakash, Daniela Rus, Bryan Kian Hsiang Low

PDF

1 Video

TL;DR

TETRIS is a new method that improves batch speculative decoding in large language models by actively selecting the most promising draft tokens for multiple requests, increasing throughput and resource efficiency.

Contribution

TETRIS introduces an active draft token selection strategy that optimizes throughput in multi-request batch decoding, outperforming existing methods.

Findings

01

Higher acceptance rate than baseline methods

02

More effective utilization of inference capacity

03

Consistently better performance in empirical tests

Abstract

We propose TETRIS, a novel method that optimizes the total throughput of batch speculative decoding in multi-request settings. Unlike existing methods that optimize for a single request or a group of requests as a whole, TETRIS actively selects the most promising draft tokens (for every request in a batch) to be accepted when verified in parallel, resulting in fewer rejected tokens and hence less wasted computing resources. Such an effective resource utilization to achieve fast inference in large language models (LLMs) is especially important to service providers with limited inference capacity. Compared to baseline speculative decoding, TETRIS yields a consistently higher acceptance rate and more effective utilization of the limited inference capacity. We show theoretically and empirically that TETRIS outperforms baseline speculative decoding and existing methods that dynamically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding· underline

Taxonomy

Methodstravel james