Attention Meets Reachability: Structural Equivalence and Efficiency in Grammar-Constrained LLM Decoding

Faruk Alpay; Bilge Senturk

arXiv:2603.05540·cs.CL·March 9, 2026

Attention Meets Reachability: Structural Equivalence and Efficiency in Grammar-Constrained LLM Decoding

Faruk Alpay, Bilge Senturk

PDF

Open Access

TL;DR

This paper investigates grammar-constrained decoding in large language models, establishing theoretical bounds, introducing structural ambiguity costs, and connecting these concepts to model architectures and optimization strategies.

Contribution

It provides a formal analysis of grammar-constrained decoding, introduces the structural ambiguity cost metric, and characterizes the complexity and efficiency of decoding algorithms.

Findings

01

SAC is bounded under right-recursion but grows quadratically under concatenation.

02

Any efficient online masking engine must incur quadratic work per token.

03

Existence of minimal-SAC grammar representatives within bounded rewrite families.

Abstract

We study grammar-constrained decoding (GCD) as a coupling between an autoregressive next-token distribution and a reachability oracle over a pushdown system compiled from a context-free grammar (CFG). We prove an oracle invariance theorem: language-equivalent grammars induce identical admissible next-token sets for every prefix, hence identical logit masks, yet can yield provably different compiled state spaces and online ambiguity costs. We give exact control-state blowup counts for the canonical $a^{n} b^{n}$ language under redundant nonterminal delegation, and introduce a left-to-right structural ambiguity cost (SAC) measuring incremental packed-parse-forest growth per token. For two equivalent grammars over all finite strings, SAC is $O (1)$ per token under right-recursion but $Θ (t^{2})$ per token and $Θ (n^{3})$ cumulatively under concatenation. We establish engine-independent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Machine Learning and Algorithms