Attention Meets Reachability: Structural Equivalence and Efficiency in Grammar-Constrained LLM Decoding
Faruk Alpay, Bilge Senturk

TL;DR
This paper investigates grammar-constrained decoding in large language models, establishing theoretical bounds, introducing structural ambiguity costs, and connecting these concepts to model architectures and optimization strategies.
Contribution
It provides a formal analysis of grammar-constrained decoding, introduces the structural ambiguity cost metric, and characterizes the complexity and efficiency of decoding algorithms.
Findings
SAC is bounded under right-recursion but grows quadratically under concatenation.
Any efficient online masking engine must incur quadratic work per token.
Existence of minimal-SAC grammar representatives within bounded rewrite families.
Abstract
We study grammar-constrained decoding (GCD) as a coupling between an autoregressive next-token distribution and a reachability oracle over a pushdown system compiled from a context-free grammar (CFG). We prove an oracle invariance theorem: language-equivalent grammars induce identical admissible next-token sets for every prefix, hence identical logit masks, yet can yield provably different compiled state spaces and online ambiguity costs. We give exact control-state blowup counts for the canonical language under redundant nonterminal delegation, and introduce a left-to-right structural ambiguity cost (SAC) measuring incremental packed-parse-forest growth per token. For two equivalent grammars over all finite strings, SAC is per token under right-recursion but per token and cumulatively under concatenation. We establish engine-independent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Machine Learning and Algorithms
