Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model   Alignment

Gregor Bachmann; Sotiris Anagnostidis; Albert Pumarola; Markos; Georgopoulos; Artsiom Sanakoyeu; Yuming Du; Edgar Sch\"onfeld; Ali Thabet,; Jonas Kohler

arXiv:2501.19309·cs.LG·February 3, 2025

Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment

Gregor Bachmann, Sotiris Anagnostidis, Albert Pumarola, Markos, Georgopoulos, Artsiom Sanakoyeu, Yuming Du, Edgar Sch\"onfeld, Ali Thabet,, Jonas Kohler

PDF

Open Access

TL;DR

This paper introduces a novel verification method for speculative decoding in large language models, enabling faster inference by recognizing correct non-aligned responses, achieving significant speedups without sacrificing quality.

Contribution

It proposes a new verification approach inspired by LLMs as judges, trained to recognize valid continuations beyond strict alignment, significantly improving decoding speed.

Findings

01

Achieves up to 9x speedup over Llama-405B.

02

Maintains high-quality outputs across various benchmarks.

03

Reaches over 130 tokens/sec on high-end hardware.

Abstract

The performance of large language models (LLMs) is closely linked to their underlying size, leading to ever-growing networks and hence slower inference. Speculative decoding has been proposed as a technique to accelerate autoregressive generation, leveraging a fast draft model to propose candidate tokens, which are then verified in parallel based on their likelihood under the target model. While this approach guarantees to reproduce the target output, it incurs a substantial penalty: many high-quality draft tokens are rejected, even when they represent objectively valid continuations. Indeed, we show that even powerful draft models such as GPT-4o, as well as human text cannot achieve high acceptance rates under the standard verification scheme. This severely limits the speedup potential of current speculative decoding methods, as an early rejection becomes overwhelmingly likely when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Law, Economics, and Judicial Systems · Legal and Constitutional Studies