EntropyCache: Decoded Token Entropy Guided KV Caching for Diffusion Language Models

Minsoo Cheong; Donghyun Son; Woosang Lim; Sungjoo Yoo

arXiv:2603.18489·cs.CL·March 20, 2026

EntropyCache: Decoded Token Entropy Guided KV Caching for Diffusion Language Models

Minsoo Cheong, Donghyun Son, Woosang Lim, Sungjoo Yoo

PDF

Open Access

TL;DR

EntropyCache is a novel, training-free method for KV caching in diffusion language models that uses token entropy to efficiently decide when to recompute, significantly speeding up inference with minimal overhead.

Contribution

It introduces a new entropy-based, constant-cost decision mechanism for KV cache updates that is independent of context length and model size.

Findings

01

Achieves 15.2x-26.4x speedup on standard benchmarks.

02

Maintains competitive accuracy with minimal decision overhead.

03

Decision process accounts for only 0.5% of inference time.

Abstract

Diffusion-based large language models (dLLMs) rely on bidirectional attention, which prevents lossless KV caching and requires a full forward pass at every denoising step. Existing approximate KV caching methods reduce this cost by selectively updating cached states, but their decision overhead scales with context length or model depth. We propose EntropyCache, a training-free KV caching method that uses the maximum entropy of newly decoded token distributions as a constant-cost signal for deciding when to recompute. Our design is grounded in two empirical observations: (1) decoded token entropy correlates with KV cache drift, providing a cheap proxy for cache staleness, and (2) feature volatility of decoded tokens persists for multiple steps after unmasking, motivating recomputation of the $k$ most recently decoded tokens. The skip-or-recompute decision requires only $O (V)$ computation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Neural Network Applications · Natural Language Processing Techniques