TreeKV: Smooth Key-Value Cache Compression with Tree Structures

Ziwei He; Jian Yuan; Haoli Bai; Jingwen Leng; Bo Jiang

arXiv:2501.04987·cs.CL·May 19, 2025

TreeKV: Smooth Key-Value Cache Compression with Tree Structures

Ziwei He, Jian Yuan, Haoli Bai, Jingwen Leng, Bo Jiang

PDF

Open Access

TL;DR

TreeKV introduces a tree-structured, training-free cache compression method that enhances language model performance on long sequences by maintaining high quality with significantly reduced cache size.

Contribution

It presents a novel, smooth cache compression technique using tree structures, enabling effective long-sequence processing without additional training.

Findings

01

Outperforms baseline models on PG19 and OpenWebText2.

02

Enables models trained on short contexts to generalize to longer contexts with 16x cache reduction.

03

Achieves top performance on Longbench with only 6% of the original cache budget.

Abstract

Efficient key-value (KV) cache compression is critical for scaling transformer-based Large Language Models (LLMs) in long sequences and resource-limited settings. Existing methods evict tokens based on their positions or importance scores, but position-based strategies can miss crucial information outside predefined regions, while those relying on global importance scores resulting in strong regional biases, limiting the KV cache's overall context retention and potentially impairing the performance of LLMs on complex tasks. Our wavelet analysis reveals that as tokens approach the end of sequence, their contributions to generation gradually increase and tends to diverge more from neighboring tokens, indicating a smooth transition with increasing complexity and variability from distant to nearby context. Motivated by this observation, we propose TreeKV, an intuitive, training-free method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Algorithms and Data Compression · Parallel Computing and Optimization Techniques