XGrammar-2: Efficient Dynamic Structured Generation Engine for Agentic LLMs

Linzhang Li; Yixin Dong; Guanjie Wang; Ziyi Xu; Alexander Jiang; Tianqi Chen

arXiv:2601.04426·cs.AI·March 27, 2026

XGrammar-2: Efficient Dynamic Structured Generation Engine for Agentic LLMs

Linzhang Li, Yixin Dong, Guanjie Wang, Ziyi Xu, Alexander Jiang, Tianqi Chen

PDF

Open Access

TL;DR

XGrammar-2 is a novel structured generation engine that significantly improves efficiency for dynamic agentic workloads in LLMs by supporting flexible structure switching and cache reuse.

Contribution

It introduces TagDispatch and Cross-Grammar Cache for dynamic structure management and reuse, enabling faster and more efficient structured generation.

Findings

01

Over 6x faster compilation than prior engines

02

Near-zero end-to-end overhead in LLM serving systems

03

Effective support for dynamic structure switching and reuse

Abstract

Modern LLM agents increasingly rely on dynamic structured generation, such as tool calling and response protocols. Unlike traditional structured generation with static structures, these workloads vary both across requests and within a request, posing new challenges to existing engines. We present XGrammar-2, a structured generation engine for dynamic agentic workloads. Our design is based on two key ideas: first-class support for tag-triggered structure switching, and fine-grained reuse across requests with different output structures. Concretely, XGrammar-2 introduces TagDispatch for dynamic structural dispatching and Cross-Grammar Cache for substructure-level cache reuse across grammars. It further improves efficiency with an Earley-based adaptive token mask cache, just-in-time compilation, and repetition state compression. Experiments show that XGrammar-2 achieves over 6x faster…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems