Tree Training: Accelerating Agentic LLMs Training via Shared Prefix Reuse

Jinghui Wang; Shaojie Wang; Yinghan Cui; Xuxing Chen; Chao Wang; Liang Huang; Can Tang; Xiaojiang Zhang; Junyi Peng; Li Wan; Haotian Zhang; Bin Chen

arXiv:2511.00413·cs.LG·April 24, 2026

Tree Training: Accelerating Agentic LLMs Training via Shared Prefix Reuse

Jinghui Wang, Shaojie Wang, Yinghan Cui, Xuxing Chen, Chao Wang, Liang Huang, Can Tang, Xiaojiang Zhang, Junyi Peng, Li Wan, Haotian Zhang, Bin Chen

PDF

TL;DR

Tree Training introduces a novel method for efficiently training agentic LLMs by leveraging shared prefix reuse in tree-structured trajectories, significantly reducing redundant computation and accelerating training.

Contribution

It proposes DFS serialization and memory-efficient partitioning techniques to enable exact, non-redundant computation over tree-structured token trajectories during training.

Findings

01

Achieves up to 6.2x training speedup on dense and MoE models.

02

Reduces redundant computation in multi-branch token trajectories.

03

Ensures exact log-probability computation across shared prefixes.

Abstract

Agentic large language model (LLM) training often involves multi-turn interaction trajectories that branch into multiple execution paths due to concurrent tool use, think-mode, sub-agent, context management and other runtime designs. As a result, the tokens produced by a single task naturally form a tree-structured token trajectory with shared prefixes, rather than a linear sequence. Existing training pipelines linearize such trajectories and treat each branch independently, leading to substantial redundant computation in both forward and backward passes. We derive that averaging the loss over all branches independently is algebraically identical to a per-token weighted loss, where each token's weight equals the fraction of branches passing through it. The problem therefore reduces to computing the log-probability of every token in the prefix tree exactly once, with no repeated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.