UniCP: A Unified Caching and Pruning Framework for Efficient Video Generation
Wenzhang Sun, Qirui Hou, Donglin Di, Jiahui Yang, Yongjia Ma, Jianxun, Cui

TL;DR
UniCP introduces a unified caching and pruning framework that dynamically optimizes attention computation in diffusion transformers, significantly improving efficiency and maintaining video quality in generation tasks.
Contribution
It proposes a novel framework combining dynamic caching and pruning techniques to address computational challenges in diffusion transformer-based video generation.
Findings
Outperforms existing methods in efficiency and quality
Effectively adapts cache windows to error fluctuations
Reduces redundant attention components with PCA slicing
Abstract
Diffusion Transformers (DiT) excel in video generation but encounter significant computational challenges due to the quadratic complexity of attention. Notably, attention differences between adjacent diffusion steps follow a U-shaped pattern. Current methods leverage this property by caching attention blocks, however, they still struggle with sudden error spikes and large discrepancies. To address these issues, we propose UniCP a unified caching and pruning framework for efficient video generation. UniCP optimizes both temporal and spatial dimensions through. Error Aware Dynamic Cache Window (EDCW): Dynamically adjusts cache window sizes for different blocks at various timesteps, adapting to abrupt error changes. PCA based Slicing (PCAS) and Dynamic Weight Shift (DWS): PCAS prunes redundant attention components, and DWS integrates caching and pruning by enabling dynamic switching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Coding and Compression Technologies · Caching and Content Delivery · Multimedia Communication and Technology
MethodsSoftmax · Attention Is All You Need · Principal Components Analysis · Attentive Walk-Aggregating Graph Neural Network · Diffusion · Pruning
