Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation
Luca Beurer-Kellner, Marc Fischer, Martin Vechev

TL;DR
This paper introduces DOMINO, a novel decoding algorithm for large language models that enforces constraints efficiently without performance loss, improving accuracy and speed over existing methods.
Contribution
We propose DOMINO, a subword-aligned constrained decoding algorithm that reduces overhead and enhances task accuracy compared to prior constrained decoding approaches.
Findings
DOMINO achieves near-zero overhead during constrained decoding.
DOMINO can double the speed of unconstrained decoding in some cases.
DOMINO outperforms existing constrained decoding methods in both speed and accuracy.
Abstract
To ensure that text generated by large language models (LLMs) is in an expected format, constrained decoding proposes to enforce strict formal language constraints during generation. However, as we show in this work, not only do such methods incur performance overhead during generation, but many of them also significantly impair task accuracy, if they do not correctly align the underlying LLM sub-word vocabularies with external constraints. To address this, we present a novel decoding algorithm, DOMINO, that can enforce constraints in a fully subword-aligned fashion, while leveraging pre-computation and speculative decoding to achieve virtually no overhead and in some cases even almost 2 speedup over unconstrained decoding -- thereby outperforming existing approaches by a wide margin.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security · Mathematics, Computing, and Information Processing · Library Science and Information Systems
MethodsALIGN
