Future Validity is the Missing Statistic: From Impossibility to $\Phi$-Estimation for Grammar-Faithful Speculative Decoding
Wenhua Nie, Zijie Meng, Kun Zou, Zheng Lin, Ziwei Li, Haoran Zheng, Jyh-Shing Roger Jang, Hao Zhang

TL;DR
This paper identifies the missing statistic for grammar-faithful speculative decoding, introduces the future-validity function $\
Contribution
It proposes the future-validity function as a correction statistic, enabling exact or approximate sampling from the grammar-conditional distribution.
Findings
Exact $\
OneStep reduces Dyck TV by 14% with under 1% overhead.
Exact dynamic programming reduces Dyck TV by 97%.
Abstract
Grammar-constrained generation is often combined with local vocabulary masking and speculative decoding, but the resulting sampling law is not the grammar-conditional distribution users usually intend. We show that any speculative decoder with local mask access, Leviathan rejection, and rollback soundness samples from the locally projected distribution rather than the grammar-conditional distribution . This extends the GAD impossibility result to speculative decoding; on Dyck grammars with Qwen3-8B, the total-variation gap can reach 0.996. We identify the future-validity function as the missing correction statistic. The target distribution is a Doob transform of the base model with , while local masking corresponds to setting to one. With exact , our oracle decoder FVO-Spec samples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
