Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

Xinyin Ma; Gongfan Fang; Michael Bi Mi; Xinchao Wang

arXiv:2406.01733·cs.LG·November 19, 2024·1 cites

Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

Xinyin Ma, Gongfan Fang, Michael Bi Mi, Xinchao Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Learning-to-Cache (L2C), a dynamic caching scheme that significantly reduces inference computation in diffusion transformers without sacrificing much quality, by learning to identify redundant layers across timesteps.

Contribution

The paper proposes a novel differentiable optimization approach for dynamic layer caching in diffusion transformers, enabling up to 93.68% computation reduction with minimal quality loss.

Findings

01

L2C outperforms existing cache-based methods and samplers like DDIM and DPM-Solver.

02

Up to 93.68% of cache step computation can be removed with less than 0.01 FID drop.

03

L2C produces a static computation graph optimized for inference speed.

Abstract

Diffusion Transformers have recently demonstrated unprecedented generative capabilities for various tasks. The encouraging results, however, come with the cost of slow inference, since each denoising step requires inference on a transformer model with a large scale of parameters. In this study, we make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through introducing a caching mechanism, can be readily removed even without updating the model parameters. In the case of U-ViT-H/2, for example, we may remove up to 93.68% of the computation in the cache steps (46.84% for all steps), with less than 0.01 drop in FID. To achieve this, we introduce a novel scheme, named Learning-to-Cache (L2C), that learns to conduct caching in a dynamic manner for diffusion transformers. Specifically, by leveraging the identical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

horseee/learning-to-cache
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFerroelectric and Negative Capacitance Devices · Low-power high-performance VLSI design · Analog and Mixed-Signal Circuit Design

MethodsDiffusion