InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion Models
Zihao Wu

TL;DR
InvarDiff is a training-free method that accelerates diffusion model sampling by exploiting feature invariance across timesteps and layers, achieving 2-3x speed-ups with minimal quality loss.
Contribution
We introduce InvarDiff, a novel caching technique leveraging cross-scale invariance in diffusion models for faster inference without retraining.
Findings
Achieves 2-3x speed-up in diffusion sampling.
Maintains high fidelity with minimal quality degradation.
Applicable to models like DiT and FLUX.
Abstract
Diffusion models deliver high-fidelity synthesis but remain slow due to iterative sampling. We empirically observe there exists feature invariance in deterministic sampling, and present InvarDiff, a training-free acceleration method that exploits the relative temporal invariance across timestep-scale and layer-scale. From a few deterministic runs, we compute a per-timestep, per-layer, per-module binary cache plan matrix and use a re-sampling correction to avoid drift when consecutive caches occur. Using quantile-based change metrics, this matrix specifies which module at which step is reused rather than recomputed. The same invariance criterion is applied at the step scale to enable cross-timestep caching, deciding whether an entire step can reuse cached results. During inference, InvarDiff performs step-first and layer-wise caching guided by this matrix. When applied to DiT and FLUX,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Image and Video Quality Assessment
