Speculative Decoding with a Speculative Vocabulary

Miles Williams; Young D. Kwon; Rui Li; Alexandros Kouris; Stylianos I. Venieris

arXiv:2602.13836·cs.CL·February 17, 2026

Speculative Decoding with a Speculative Vocabulary

Miles Williams, Young D. Kwon, Rui Li, Alexandros Kouris, Stylianos I. Venieris

PDF

Open Access

TL;DR

This paper introduces SpecVocab, a novel vocabulary speculation method for accelerating language model inference, which dynamically selects vocab subsets to improve throughput without sacrificing decoding accuracy.

Contribution

It proposes SpecVocab, a new approach that enhances speculative decoding by dynamically selecting vocabularies, outperforming existing methods like EAGLE-3 in speed and efficiency.

Findings

01

Achieves up to 8.1% higher throughput than EAGLE-3.

02

Demonstrates effectiveness across various tasks.

03

Maintains decoding accuracy with dynamic vocabulary selection.

Abstract

Speculative decoding has rapidly emerged as a leading approach for accelerating language model (LM) inference, as it offers substantial speedups while yielding identical outputs. This relies upon a small draft model, tasked with predicting the outputs of the target model. State-of-the-art speculative decoding methods use a draft model consisting of a single decoder layer and output embedding matrix, with the latter dominating drafting time for the latest LMs. Recent work has sought to address this output distribution bottleneck by reducing the vocabulary of the draft model. Although this can improve throughput, it compromises speculation effectiveness when the target token is out-of-vocabulary. In this paper, we argue for vocabulary speculation as an alternative to a reduced vocabulary. We propose SpecVocab, an efficient and effective method that selects a vocabulary subset per decoding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods