PreciseCache: Precise Feature Caching for Efficient and High-fidelity Video Generation

Jiangshan Wang; Kang Zhao; Jiayi Guo; Jiayu Wang; Hang Guo; Chenyang Zhu; Xiu Li; Xiangyu Yue

arXiv:2603.00976·cs.CV·March 4, 2026

PreciseCache: Precise Feature Caching for Efficient and High-fidelity Video Generation

Jiangshan Wang, Kang Zhao, Jiayi Guo, Jiayu Wang, Hang Guo, Chenyang Zhu, Xiu Li, Xiangyu Yue

PDF

Open Access 3 Reviews

TL;DR

PreciseCache is a framework that accelerates high-fidelity video generation by accurately detecting and skipping redundant computations at both step and block levels, maintaining quality while significantly improving speed.

Contribution

It introduces a novel precise redundancy detection method with LFCache and BlockCache, enabling faster video generation without quality degradation.

Findings

01

Achieves 2.6x speedup on Wan2.1-14B model

02

Effectively detects redundant features with Low-Frequency Difference

03

Maintains high-quality output despite acceleration

Abstract

High computational costs and slow inference hinder the practical application of video generation models. While prior works accelerate the generation process through feature caching, they often suffer from notable quality degradation. In this work, we reveal that this issue arises from their inability to distinguish truly redundant features, which leads to the unintended skipping of computations on important features. To address this, we propose \textbf{PreciseCache}, a plug-and-play framework that precisely detects and skips truly redundant computations, thereby accelerating inference without sacrificing quality. Specifically, PreciseCache contains two components: LFCache for step-wise caching and BlockCache for block-wise caching. For LFCache, we compute the Low-Frequency Difference (LFD) between the prediction features of the current step and those from the previous cached step.…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

PreciseCache demonstrates outstanding performance in **redundancy detection precision** and **cross-model adaptability**, particularly excelling in video generation—a highly complex task—where it effectively balances the core trade-off between *acceleration ratio* and *quality preservation*. 1. **Frequency-Domain-Based Precise Redundancy Detection, Overcoming Blind Caching** Existing caching methods (e.g., TeaCache, FasterCache) often rely on *uniform time intervals* or *global feature dif

Weaknesses

1. **BlockCache Increases Memory Overhead, Limiting Single-GPU Deployment** BlockCache caches *input–output deltas* for each Transformer block, significantly increasing GPU memory usage—especially in large-scale or high-resolution setups: - Appendix A.2 states that for Wan2.1-14B (1080P), PreciseCache-Flash cannot run on a single 80 GB A800 GPU and requires multi-GPU execution. - The paper reports latency and MACs but omits explicit memory growth metrics (e.g., +20% or +50% vs bas

Reviewer 02Rating 4Confidence 3

Strengths

This paper proposes a novel novel adaptive-adaptive skipping strategy, which achieves adaptive skipping with minimal computation by correlating output differences with low-frequency differences and further associating them with lightweight subsampling.

Weaknesses

1. The LFD method proposed in this paper essentially implements adaptive skipping. How does this method perform compared to AdaptiveDiffusion(Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy)? 2. The BlockCache method achieves acceleration by caching block differences. How does this method perform in comparison with ∆-DiT(∆-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers)? 3. Can the method in this paper be presented more clearly in t

Reviewer 03Rating 8Confidence 5

Strengths

+ The step-wise LFCache determines when to skip entire denoising steps, while the block-wise BlockCache further skips redundant network blocks within the key steps, forming a hierarchical strategy that compresses redundant computation in a simple yet effective manner. + By performing a spatiotemporal downsampled “trial run” on the latent to estimate the LFD, the method greatly reduces the decision overhead compared to a full forward pass, which is an idea both interesting and effective. + The

Weaknesses

- The paper involves several hyperparameter settings, such as the low-frequency cutoff radius, the cache window size L, and the key-block selection ratio Top-c% in BlockCache. However, the current version lacks experiments or analyses demonstrating the robustness of these choices. It would strengthen the work to include additional studies or sensitivity analyses that show how variations in these hyperparameters affect performance and stability. - The method suffers from excessive memory consump

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Caching and Content Delivery · Generative Adversarial Networks and Image Synthesis