SynCode: LLM Generation with Grammar Augmentation
Shubham Ugare, Tarun Suresh, Hangoo Kang, Sasa Misailovic, Gagandeep, Singh

TL;DR
SynCode is a novel framework that enhances large language models' ability to generate syntactically correct code and data formats by integrating grammar-based filtering, significantly reducing syntax errors across multiple languages.
Contribution
SynCode introduces a grammar augmentation method using DFA masks to ensure sound and complete syntax adherence in LLM outputs across various formal languages.
Findings
Eliminates all syntax errors in JSON generation.
Reduces 96.07% of syntax errors in Python and Go code.
Outperforms existing baselines in syntactical accuracy.
Abstract
LLMs are widely used in complex AI applications. These applications underscore the need for LLM outputs to adhere to a specific format, for their integration with other components in the systems. Typically the format rules e.g., for data serialization formats such as JSON, YAML, or Code in Programming Language are expressed as context-free grammar (CFG). Due to the hallucinations and unreliability of LLMs, instructing LLMs to adhere to specified syntax becomes an increasingly important challenge. We present SynCode, a novel framework for efficient and general syntactical decoding with LLMs, to address this challenge. SynCode ensures soundness and completeness with respect to the CFG of a formal language, effectively retaining valid tokens while filtering out invalid ones. SynCode uses an offline-constructed, efficient lookup table, the DFA mask store, derived from the DFA of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Digital Rights Management and Security
MethodsDirect Feedback Alignment
