CodeComp: Structural KV Cache Compression for Agentic Coding
Qiujiang Chen, Jing Xiong, Chenyang Zhao, Sidi Yang, Ngai Wong

TL;DR
CodeComp introduces a static analysis-based KV cache compression method for agentic coding tasks, significantly improving performance under memory constraints without model changes.
Contribution
It is a training-free framework that leverages static program analysis via Code Property Graphs to enhance KV cache compression in LLM inference for code tasks.
Findings
Outperforms attention-only compression baselines across benchmarks.
Recovers most full-context accuracy with aggressive compression.
Maintains patch generation quality comparable to uncompressed inference.
Abstract
Agentic code tasks such as fault localization and patch generation require processing long codebases under tight memory constraints, where the Key-Value (KV) cache becomes the primary inference bottleneck. Existing compression methods rely exclusively on attention signals to estimate token importance, systematically discarding structurally critical tokens such as call sites, branch conditions, and assignments that are essential for code understanding. We present CodeComp, a training-free KV cache compression framework that incorporates static program analysis into LLM inference via Code Property Graph priors extracted by Joern. Across bug localization and code generation benchmarks, CodeComp consistently outperforms attention-only compression baselines under equal memory budgets, recovering the majority of full-context accuracy under aggressive KV cache compression, while matching the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
