SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers
Joseph Liu, Joshua Geddes, Ziyu Guo, Haomiao Jiang, Mahesh Kumar Nandwana

TL;DR
SmoothCache is a universal inference acceleration method for Diffusion Transformers that adaptively caches features, significantly speeding up generation while preserving or enhancing quality across multiple modalities.
Contribution
It introduces a model-agnostic caching technique that leverages layer output similarities to accelerate inference in Diffusion Transformers across diverse tasks.
Findings
Achieves 8% to 71% speedup in inference
Maintains or improves generation quality
Effective across image, video, and audio modalities
Abstract
Diffusion Transformers (DiT) have emerged as powerful generative models for various tasks, including image, video, and speech synthesis. However, their inference process remains computationally expensive due to the repeated evaluation of resource-intensive attention and feed-forward modules. To address this, we introduce SmoothCache, a model-agnostic inference acceleration technique for DiT architectures. SmoothCache leverages the observed high similarity between layer outputs across adjacent diffusion timesteps. By analyzing layer-wise representation errors from a small calibration set, SmoothCache adaptively caches and reuses key features during inference. Our experiments demonstrate that SmoothCache achieves 8% to 71% speed up while maintaining or even improving generation quality across diverse modalities. We showcase its effectiveness on DiT-XL for image generation, Open-Sora for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNuclear Physics and Applications · Nuclear Materials and Properties
MethodsSoftmax · Attention Is All You Need · Diffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
