Continuous Diffusion Models Can Obey Formal Syntax
Jinwoo Kim, Taylor Berg-Kirkpatrick, Loris D'Antoni

TL;DR
This paper presents a training-free guidance method for diffusion language models to generate syntactically valid outputs by using an analytic score based on regular expressions, improving constraint satisfaction without retraining.
Contribution
The authors introduce a novel, training-free guidance technique that steers continuous diffusion models to obey formal syntax constraints using an analytic score and gradient guidance.
Findings
Achieves 68-96% constraint satisfaction on JSON and language benchmarks.
Outperforms autoregressive constrained decoding in both constraint satisfaction and output quality.
Imposes syntactic constraints with minimal perplexity cost.
Abstract
Diffusion language models offer a promising alternative to autoregressive models due to their global, non-causal generation process, but their continuous latent dynamics make discrete constraints -- e.g., the output should be a JSON file that matches a given schema -- difficult to impose. We introduce a training-free guidance method for steering continuous diffusion language models to satisfy formal syntactic constraints expressed using regular expressions. Our approach constructs an analytic score estimating the probability that a latent state decodes to a valid string accepted by a given regular expression, and uses its gradient to guide sampling, without training auxiliary classifiers. The denoising process targets the base model conditioned on syntactic validity. We implement our method in Diffinity on top of the PLAID diffusion model and evaluate it on 180 regular-expression…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Language and cultural evolution
