Syntactic Control of Language Models by Posterior Inference

Vicky Xefteri; Tim Vieira; Ryan Cotterell; Afra Amini

arXiv:2506.07154·cs.CL·June 10, 2025

Syntactic Control of Language Models by Posterior Inference

Vicky Xefteri, Tim Vieira, Ryan Cotterell, Afra Amini

PDF

Open Access 1 Video

TL;DR

This paper introduces a sampling-based method combining posterior inference and syntactic tagging to control the syntactic structure of language model outputs, significantly improving syntactic accuracy without losing fluency.

Contribution

It presents a novel approach that uses sequential Monte Carlo sampling with syntactic tags to enforce target syntax during language generation, demonstrating substantial accuracy improvements.

Findings

01

Syntactic accuracy increased from 12.31 to 93 in GPT2-large.

02

Syntactic accuracy increased from 35.33 to 93 in Llama3-8B.

03

Method maintains language fluency while controlling syntax.

Abstract

Controlling the syntactic structure of text generated by language models is valuable for applications requiring clarity, stylistic consistency, or interpretability, yet it remains a challenging task. In this paper, we argue that sampling algorithms based on the posterior inference can effectively enforce a target constituency structure during generation. Our approach combines sequential Monte Carlo, which estimates the posterior distribution by sampling from a proposal distribution, with a syntactic tagger that ensures that each generated token aligns with the desired syntactic structure. Our experiments with GPT2 and Llama3-8B models show that with an appropriate proposal distribution, we can improve syntactic accuracy, increasing the F1 score from $12.31$ (GPT2-large) and $35.33$ (Llama3-8B) to about $93$ in both cases without compromising the language model's fluency. These results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Syntactic Control of Language Models by Posterior Inference· underline

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare