SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers

Joseph Liu; Joshua Geddes; Ziyu Guo; Haomiao Jiang; Mahesh Kumar Nandwana

arXiv:2411.10510·cs.LG·May 23, 2025

SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers

Joseph Liu, Joshua Geddes, Ziyu Guo, Haomiao Jiang, Mahesh Kumar Nandwana

PDF

Open Access 1 Repo

TL;DR

SmoothCache is a universal inference acceleration method for Diffusion Transformers that adaptively caches features, significantly speeding up generation while preserving or enhancing quality across multiple modalities.

Contribution

It introduces a model-agnostic caching technique that leverages layer output similarities to accelerate inference in Diffusion Transformers across diverse tasks.

Findings

01

Achieves 8% to 71% speedup in inference

02

Maintains or improves generation quality

03

Effective across image, video, and audio modalities

Abstract

Diffusion Transformers (DiT) have emerged as powerful generative models for various tasks, including image, video, and speech synthesis. However, their inference process remains computationally expensive due to the repeated evaluation of resource-intensive attention and feed-forward modules. To address this, we introduce SmoothCache, a model-agnostic inference acceleration technique for DiT architectures. SmoothCache leverages the observed high similarity between layer outputs across adjacent diffusion timesteps. By analyzing layer-wise representation errors from a small calibration set, SmoothCache adaptively caches and reuses key features during inference. Our experiments demonstrate that SmoothCache achieves 8% to 71% speed up while maintaining or even improving generation quality across diverse modalities. We showcase its effectiveness on DiT-XL for image generation, Open-Sora for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

roblox/smoothcache
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNuclear Physics and Applications · Nuclear Materials and Properties

MethodsSoftmax · Attention Is All You Need · Diffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings