Tree Training: Accelerating Agentic LLMs Training via Shared Prefix Reuse
Jinghui Wang, Shaojie Wang, Yinghan Cui, Xuxing Chen, Chao Wang, Liang Huang, Can Tang, Xiaojiang Zhang, Junyi Peng, Li Wan, Haotian Zhang, Bin Chen

TL;DR
Tree Training introduces a novel method for efficiently training agentic LLMs by leveraging shared prefix reuse in tree-structured trajectories, significantly reducing redundant computation and accelerating training.
Contribution
It proposes DFS serialization and memory-efficient partitioning techniques to enable exact, non-redundant computation over tree-structured token trajectories during training.
Findings
Achieves up to 6.2x training speedup on dense and MoE models.
Reduces redundant computation in multi-branch token trajectories.
Ensures exact log-probability computation across shared prefixes.
Abstract
Agentic large language model (LLM) training often involves multi-turn interaction trajectories that branch into multiple execution paths due to concurrent tool use, think-mode, sub-agent, context management and other runtime designs. As a result, the tokens produced by a single task naturally form a tree-structured token trajectory with shared prefixes, rather than a linear sequence. Existing training pipelines linearize such trajectories and treat each branch independently, leading to substantial redundant computation in both forward and backward passes. We derive that averaging the loss over all branches independently is algebraically identical to a per-token weighted loss, where each token's weight equals the fraction of branches passing through it. The problem therefore reduces to computing the log-probability of every token in the prefix tree exactly once, with no repeated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
