SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning

Sanjay Kariyappa; G. Edward Suh

arXiv:2602.22603·cs.AI·March 3, 2026

SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning

Sanjay Kariyappa, G. Edward Suh

PDF

Open Access

TL;DR

SideQuest introduces a model-driven KV cache compression method for long-horizon reasoning tasks, significantly reducing memory usage while maintaining accuracy by leveraging the reasoning model itself for cache management.

Contribution

It presents a novel approach where the reasoning model performs KV cache compression, addressing limitations of heuristic methods in multi-step reasoning tasks.

Findings

01

Reduces peak token usage by up to 65%

02

Achieves minimal accuracy degradation

03

Outperforms heuristic-based compression techniques

Abstract

Long-running agentic tasks, such as deep research, require multi-hop reasoning over information distributed across multiple webpages and documents. In such tasks, the LLM context is dominated by tokens from external retrieval, causing memory usage to grow rapidly and limiting decode performance. While several KV cache compression techniques exist for long-context inputs, we find that existing heuristics fail to support multi-step reasoning models effectively. We address this challenge with SideQuest -- a novel approach that leverages the Large Reasoning Model (LRM) itself to perform KV cache compression by reasoning about the usefulness of tokens in its context. To prevent the tokens associated with this management process from polluting the model's memory, we frame KV cache compression as an auxiliary task executed in parallel to the main reasoning task. Our evaluations, using a model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Multi-Agent Systems and Negotiation · Natural Language Processing Techniques