ArborKV: Structure-Aware KV Cache Management for Scaling Tree-based LLM Reasoning

Yeqiu Chen; Ziyan Liu; Zhenxin Huang; Runquan Gui; Hong Wang; Lei Liu

arXiv:2605.22106·cs.AI·May 22, 2026

ArborKV: Structure-Aware KV Cache Management for Scaling Tree-based LLM Reasoning

Yeqiu Chen, Ziyan Liu, Zhenxin Huang, Runquan Gui, Hong Wang, Lei Liu

PDF

TL;DR

ArborKV is a novel cache management method for tree-based LLM reasoning that significantly reduces memory usage while maintaining accuracy, enabling larger searches within fixed hardware constraints.

Contribution

It introduces a structure-aware eviction framework with a lightweight estimator and lazy rehydration, improving memory efficiency for ToT-style reasoning.

Findings

01

Achieves up to ~4x peak KV-memory reduction.

02

Maintains near-full-retention accuracy.

03

Enables larger search configurations under fixed memory budgets.

Abstract

Recent progress in LLM reasoning has increasingly shifted from single-pass generation to explicit search over intermediate reasoning states. Tree-of-Thoughts (ToT) organizes inference to tree-structured search with branching and backtracking, but it substantially amplifies the Key--Value (KV) cache: retaining KV states for a frontier of partial trajectories quickly becomes a memory bottleneck that limits throughput and constrains search depth and width under fixed hardware budgets. We address this challenge by observing that KV reuse in ToT-style inference is governed by search dynamics: near-term decoding depends primarily on the active branch and its ancestors, whereas inactive subtrees have low short-term reuse probability yet must remain recoverable for backtracking. Motivated by this, we propose ArborKV, a structure-aware eviction framework that couples a lightweight value…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.