Constrained Decoding with Speculative Lookaheads

Nishanth Nakshatri; Shamik Roy; Rajarshi Das; Suthee Chaidaroon,; Leonid Boytsov; Rashmi Gangadharaiah

arXiv:2412.10418·cs.CL·February 12, 2025

Constrained Decoding with Speculative Lookaheads

Nishanth Nakshatri, Shamik Roy, Rajarshi Das, Suthee Chaidaroon,, Leonid Boytsov, Rashmi Gangadharaiah

PDF

Open Access 1 Video

TL;DR

The paper introduces CDSL, a novel decoding method that uses speculative lookaheads to significantly improve inference efficiency in constrained decoding tasks with large language models, achieving up to 12.15x speedup.

Contribution

It proposes a new speculative decoding technique that combines a draft and target LLM to enhance efficiency while maintaining high constraint satisfaction.

Findings

01

Achieves 2.2x to 12.15x speedup over existing methods.

02

Maintains strong performance with minimal constraint satisfaction loss.

03

Validated across multiple tasks and LLM families.

Abstract

Constrained decoding with lookahead heuristics (CDLH) is a highly effective method for aligning LLM generations to human preferences. However, the extensive lookahead roll-out operations for each generated token makes CDLH prohibitively expensive, resulting in low adoption in practice. In contrast, common decoding strategies such as greedy decoding are extremely efficient, but achieve very low constraint satisfaction. We propose constrained decoding with speculative lookaheads (CDSL), a technique that significantly improves upon the inference efficiency of CDLH without experiencing the drastic performance reduction seen with greedy decoding. CDSL is motivated by the recently proposed idea of speculative decoding that uses a much smaller draft LLM for generation and a larger target LLM for verification. In CDSL, the draft model is used to generate lookaheads which is verified by a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Constrained Decoding with Speculative Lookaheads· underline

Taxonomy

TopicsConstraint Satisfaction and Optimization