WorldKV: Efficient World Memory with World Retrieval and Compression

Jung Yi; Minjae Kim; Paul Hyunbin Cho; Wooseok Jang; Sangdoo Yun; Seungryong Kim

arXiv:2605.22718·cs.CV·May 22, 2026

WorldKV: Efficient World Memory with World Retrieval and Compression

Jung Yi, Minjae Kim, Paul Hyunbin Cho, Wooseok Jang, Sangdoo Yun, Seungryong Kim

PDF

1 Repo

TL;DR

WorldKV is a novel framework that enhances persistent world modeling in autoregressive video diffusion by combining retrieval and compression techniques, achieving high fidelity and throughput without fine-tuning.

Contribution

It introduces a training-free approach with World Retrieval and World Compression to maintain long-term consistency efficiently.

Findings

01

Matches or exceeds full-KV memory fidelity

02

Doubles throughput compared to full-KV methods

03

Operates without fine-tuning, competitive with trained baselines

Abstract

Autoregressive video diffusion models have enabled real-time, action-conditioned world generation. However, sustaining a persistent world, where revisiting a previously seen viewpoint yields consistent content, remains an open problem. Full KV-cache attention preserves this consistency but breaks real-time constraints: memory footprint and attention cost grow linearly with rollout length. Sliding window inference restores throughput but discards long-term consistency. We propose WorldKV, a training-free framework with two components: World Retrieval and World Compression. World Retrieval stores evicted KV-cache chunks in GPU/CPU memory and selectively retrieves scene-relevant chunks via camera/ action correspondence, inserting them back into the native attention window without re-encoding. World Compression prunes redundant tokens within each chunk via key-key similarity to an anchor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://cvlab-kaist.github.io/WorldKV
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.