Automata-based constraints for language model decoding
Terry Koo, Frederick Liu, Luheng He

TL;DR
This paper introduces an automata-based method to constrain language model decoding, ensuring outputs adhere to formal languages like JSON or YAML efficiently and correctly, addressing tokenization issues and enabling scalable, task-specific applications.
Contribution
The authors develop a novel automata-based framework for constraining language model outputs to formal languages, offering a faster, correct, and extensible alternative to previous bespoke solutions.
Findings
System compiles constraints approximately 7,000 times faster
Ensures provable correctness of constrained decoding
Extensible to deterministic context-free languages
Abstract
Language models (LMs) are often expected to generate strings in some formal language; for example, structured data, API calls, or code snippets. Although LMs can be tuned to improve their adherence to formal syntax, this does not guarantee conformance, especially with smaller LMs suitable for large-scale deployment. In addition, tuning requires significant resources, making it impractical for uncommon or task-specific formats. To prevent downstream parsing errors we would ideally constrain the LM to only produce valid output, but this is severely complicated by tokenization, which is typically both ambiguous and misaligned with the formal grammar. We solve these issues through the application of automata theory, deriving an efficient closed-form solution for the regular languages, a broad class of formal languages with many practical applications, including API calls or schema-guided…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Multi-Agent Systems and Negotiation
