X-Cache: Cross-Chunk Block Caching for Few-Step Autoregressive World Models Inference
Yixiao Zeng, Jianlei Zheng, Chaoda Zheng, Shijia Chen, Mingdian Liu, Tongping Liu, Tengwei Luo, Yu Zhang, Boyang Wang, Linkun Xu, Siyuan Lu, Bo Tian, Xianming Liu

TL;DR
X-Cache is a novel caching method that accelerates autoregressive world model inference by reusing computation across chunks, significantly reducing inference time without substantial accuracy loss.
Contribution
It introduces a training-free, cross-chunk caching technique with a dual-metric gating mechanism for efficient autoregressive inference in world models.
Findings
Achieves 71% block skip rate during inference.
Provides a 2.6x speedup in wall-clock time.
Maintains minimal degradation in model performance.
Abstract
Real-time world simulation is becoming a key infrastructure for scalable evaluation and online reinforcement learning of autonomous driving systems. Recent driving world models built on autoregressive video diffusion achieve high-fidelity, controllable multi-camera generation, but their inference cost remains a bottleneck for interactive deployment. However, existing diffusion caching methods are designed for offline video generation with multiple denoising steps, and do not transfer to this scenario. Few-step distilled models have no inter-step redundancy left for these methods to reuse, and sequence-level parallelization techniques require future conditioning that closed-loop interactive generation does not provide. We present X-Cache, a training-free acceleration method that caches along a different axis: across consecutive generation chunks rather than across denoising steps.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
