CHESS: Context-aware Hierarchical Efficient Semantic Selection for Long-Context LLM Inference

Chao Fei; Guozhong Li; Chenxi Liu; Panos Kalnis

arXiv:2602.20732·cs.AI·February 25, 2026

CHESS: Context-aware Hierarchical Efficient Semantic Selection for Long-Context LLM Inference

Chao Fei, Guozhong Li, Chenxi Liu, Panos Kalnis

PDF

Open Access

TL;DR

CHESS is a novel system that improves long-context LLM inference by dynamically managing KV cache with a context-aware hierarchical approach, achieving high throughput and quality with minimal cache usage.

Contribution

It introduces a combined algorithm-system design for KV-cache management, enabling efficient, high-quality long-context inference with reduced cache and latency.

Findings

01

Surpasses full KV quality with only 1% cache

02

Achieves up to 4.56× throughput increase

03

Outperforms existing baselines in stability and speed

Abstract

Long-context LLMs demand accurate inference at low latency, yet decoding becomes primarily constrained by KV cache as context grows. Prior pruning methods are largely context-agnostic: their token selection ignores step-wise relevance and local semantics, which undermines quality. Moreover, their irregular accesses and selection overheads yield only limited wall-clock speedups. To address this, we propose \textbf{CHESS}, an \textit{algorithm-system co-design} KV-cache management system. Algorithmically, CHESS introduces a context-aware, hierarchical selection policy that dynamically reconstructs a coherent context for the current decoding. System-wise, coarse granularity selection eliminates expensive data movement, fully realizing practical acceleration from theoretical sparsity. Extensive evaluations demonstrate that CHESS surpasses Full-KV quality using only \textbf{1\%} of the KV…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Advanced Data Storage Technologies · Natural Language Processing Techniques