Constrained Decoding with Speculative Lookaheads
Nishanth Nakshatri, Shamik Roy, Rajarshi Das, Suthee Chaidaroon,, Leonid Boytsov, Rashmi Gangadharaiah

TL;DR
The paper introduces CDSL, a novel decoding method that uses speculative lookaheads to significantly improve inference efficiency in constrained decoding tasks with large language models, achieving up to 12.15x speedup.
Contribution
It proposes a new speculative decoding technique that combines a draft and target LLM to enhance efficiency while maintaining high constraint satisfaction.
Findings
Achieves 2.2x to 12.15x speedup over existing methods.
Maintains strong performance with minimal constraint satisfaction loss.
Validated across multiple tasks and LLM families.
Abstract
Constrained decoding with lookahead heuristics (CDLH) is a highly effective method for aligning LLM generations to human preferences. However, the extensive lookahead roll-out operations for each generated token makes CDLH prohibitively expensive, resulting in low adoption in practice. In contrast, common decoding strategies such as greedy decoding are extremely efficient, but achieve very low constraint satisfaction. We propose constrained decoding with speculative lookaheads (CDSL), a technique that significantly improves upon the inference efficiency of CDLH without experiencing the drastic performance reduction seen with greedy decoding. CDSL is motivated by the recently proposed idea of speculative decoding that uses a much smaller draft LLM for generation and a larger target LLM for verification. In CDSL, the draft model is used to generate lookaheads which is verified by a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsConstraint Satisfaction and Optimization
