How Much Cache Does Reasoning Need? Depth-Cache Tradeoffs in KV-Compressed Transformers

Xiao Wang

arXiv:2604.17935·cs.LG·April 21, 2026

How Much Cache Does Reasoning Need? Depth-Cache Tradeoffs in KV-Compressed Transformers

Xiao Wang

PDF

TL;DR

This paper investigates the theoretical limits of KV cache compression in Transformer models for multi-step reasoning, providing bounds on depth, bandwidth, and error scaling.

Contribution

It introduces new bounds on Transformer depth related to cache size and compression, and analyzes the impact of cache adaptivity on reasoning accuracy.

Findings

01

Proves upper and lower bounds on Transformer depth with compressed KV caches.

02

Identifies bandwidth limitations when attention dimension times precision exceeds log n.

03

Shows adaptive caches outperform oblivious caches in error scaling for multi-hop reasoning.

Abstract

The key-value (KV) cache is the dominant memory bottleneck during Transformer inference, yet little is known theoretically about how aggressively it can be compressed before multi-step reasoning degrades. We study this through $k$ -hop pointer chasing on $n$ tokens under a shared KV cache of size $s$ , attention dimension $m$ , $H$ heads, $p$ -bit precision, and a locality-respecting cache controller (satisfied by all standard KV-compression methods). We give three results. (1) Product depth lower bound (conjectured). We conjecture that any such Transformer ( $n \geq 4 k$ , $s \leq n /4$ ) requires depth $L = Ω (⌈ k / s ⌉ \cdot ⌈ lo g_{2} n / (H m p)⌉)$ , and isolate the sole remaining gap as a probabilistic step on the joint distribution of cache trace and pointer chain. Unconditionally, we prove a matching upper bound $L = O(\min(k, \lceil k/s \rceil \log s) \cdot \log…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.