Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
Gregor Bachmann, Sotiris Anagnostidis, Albert Pumarola, Markos, Georgopoulos, Artsiom Sanakoyeu, Yuming Du, Edgar Sch\"onfeld, Ali Thabet,, Jonas Kohler

TL;DR
This paper introduces a novel verification method for speculative decoding in large language models, enabling faster inference by recognizing correct non-aligned responses, achieving significant speedups without sacrificing quality.
Contribution
It proposes a new verification approach inspired by LLMs as judges, trained to recognize valid continuations beyond strict alignment, significantly improving decoding speed.
Findings
Achieves up to 9x speedup over Llama-405B.
Maintains high-quality outputs across various benchmarks.
Reaches over 130 tokens/sec on high-end hardware.
Abstract
The performance of large language models (LLMs) is closely linked to their underlying size, leading to ever-growing networks and hence slower inference. Speculative decoding has been proposed as a technique to accelerate autoregressive generation, leveraging a fast draft model to propose candidate tokens, which are then verified in parallel based on their likelihood under the target model. While this approach guarantees to reproduce the target output, it incurs a substantial penalty: many high-quality draft tokens are rejected, even when they represent objectively valid continuations. Indeed, we show that even powerful draft models such as GPT-4o, as well as human text cannot achieve high acceptance rates under the standard verification scheme. This severely limits the speedup potential of current speculative decoding methods, as an early rejection becomes overwhelmingly likely when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Law, Economics, and Judicial Systems · Legal and Constitutional Studies
