Graph-Guided Adaptive Channel Elimination for KV Cache Compression

Enwei Tong; Yao Zhu; Yuanchao Bai; Kai Wang; Xianming Liu; Xiangyang Ji

arXiv:2604.16983·eess.SP·April 21, 2026

Graph-Guided Adaptive Channel Elimination for KV Cache Compression

Enwei Tong, Yao Zhu, Yuanchao Bai, Kai Wang, Xianming Liu, Xiangyang Ji

PDF

TL;DR

GRACE is a graph-based framework for KV cache compression in large language models, achieving 60% size reduction with minimal performance loss by modeling channel interactions and protecting salient channels.

Contribution

It introduces a novel graph-guided approach for adaptive channel elimination in KV cache compression, considering inter-channel interactions and saliency.

Findings

01

Reduces KV cache size by 60% with negligible performance loss.

02

Outperforms state-of-the-art methods in cache compression.

03

Models channel interactions as a graph for optimized pruning.

Abstract

Large Language Models have revolutionized natural language processing, achieving unprecedented success across a vast range of tasks. However, their practical application in long-context scenarios is severely hampered by the formidable memory footprint of the Key-Value cache. While channel pruning has emerged as a promising compression strategy, existing methods evaluate channel importance in isolation, fundamentally ignoring the inter-channel interactions that collectively dictate model performance. This oversight leads to suboptimal pruning decisions. To address this, we introduce \textbf{GRACE} (\textbf{GR}aph-guided \textbf{A}daptive \textbf{C}hannel \textbf{E}limination), a novel framework that reframes KV cache compression as a graph-based optimization problem. GRACE models channels as nodes and their interactions as weighted edges, enabling the identification of a near-optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.