Pre$^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation

Junyi Chen; Shihao Bai; Zaijun Wang; Siyu Wu; Chuheng Du; Hailong Yang; Ruihao Gong; Shengzhong Liu; Fan Wu; Guihai Chen

arXiv:2506.03887·cs.CL·October 6, 2025

Pre$^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation

Junyi Chen, Shihao Bai, Zaijun Wang, Siyu Wu, Chuheng Du, Hailong Yang, Ruihao Gong, Shengzhong Liu, Fan Wu, Guihai Chen

PDF

1 Repo 1 Video

TL;DR

Pre$^3$ introduces a method to convert LR(1) grammars into deterministic pushdown automata, significantly improving structured LLM generation efficiency by reducing token processing time and increasing throughput.

Contribution

It presents a novel approach to transform LR(1) transition graphs into DPDAs, enabling faster and more efficient structured output generation in LLMs.

Findings

01

Reduced time per output token by up to 40%.

02

Increased throughput by up to 36%.

03

Seamless integration into standard inference frameworks.

Abstract

Extensive LLM applications demand efficient structured generations, particularly for LR(1) grammars, to produce outputs in specified formats (e.g., JSON). Existing methods primarily parse LR(1) grammars into a pushdown automaton (PDA), leading to runtime execution overhead for context-dependent token processing, especially inefficient under large inference batches. To address these issues, we propose Pre $^{3}$ that exploits deterministic pushdown automata (DPDA) to optimize the constrained LLM decoding efficiency. First, by precomputing prefix-conditioned edges during the preprocessing, Pre $^{3}$ enables ahead-of-time edge analysis and thus makes parallel transition processing possible. Second, by leveraging the prefix-conditioned edges, Pre $^{3}$ introduces a novel approach that transforms LR(1) transition graphs into DPDA, eliminating the need for runtime path exploration and achieving edge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

modeltc/lightllm
pytorchOfficial

Videos

Pre³: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation· underline