Earley-Driven Dynamic Pruning for Efficient Structured Decoding
Xintong Sun, Chi Wei, Minghao Tian, Shiwen Ni

TL;DR
This paper introduces ZapFormat, a dynamic pruning strategy for constrained decoding in large language models that reduces computational overhead and speeds up structured generation tasks while maintaining high output quality.
Contribution
We propose ZapFormat, a novel Earley algorithm-based dynamic pruning method that improves efficiency and scalability of constrained decoding in LLMs for structured output generation.
Findings
Achieves up to 2x speedup in inference time.
Maintains high-precision compliance with structural constraints.
Applicable across various LLM architectures.
Abstract
Large Language Models (LLMs) have shown remarkable capabilities, yet ensuring their outputs conform to strict structural or grammatical constraints remains challenging, which is critical in function calls and domain-specific language (DSL) generation. Constrained decoding with context-free grammar is a flexible approach to guarantee LLMs' adherence to a specific format by dynamically building a token logits mask. However, creating this mask requires checking the validity of all tokens in the LLM vocabulary at every decoding step, which often incurs significant overheads in existing constrained decoding engines. To address this challenge, we propose , a novel strategy based on the Earley algorithm that identifies and eliminates invalid or redundant Earley states in real-time, significantly reducing memory occupation of the Earley algorithm's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetal Forming Simulation Techniques · Advanced Surface Polishing Techniques
