FORA: Fast-Forward Caching in Diffusion Transformer Acceleration

Pratheba Selvaraju; Tianyu Ding; Tianyi Chen; Ilya Zharkov; Luming; Liang

arXiv:2407.01425·cs.CV·July 2, 2024·2 cites

FORA: Fast-Forward Caching in Diffusion Transformer Acceleration

Pratheba Selvaraju, Tianyu Ding, Tianyi Chen, Ilya Zharkov, Luming, Liang

PDF

Open Access 1 Repo

TL;DR

FORA introduces a caching technique that leverages the repetitive nature of diffusion processes to significantly accelerate diffusion transformers without retraining, enabling real-time image and video generation.

Contribution

The paper proposes FORA, a caching method that speeds up diffusion transformers by reusing intermediate outputs, requiring no retraining and compatible with existing models.

Findings

01

Speeds up diffusion transformers several times

02

Minimal impact on image quality metrics

03

Seamless integration with existing models

Abstract

Diffusion transformers (DiT) have become the de facto choice for generating high-quality images and videos, largely due to their scalability, which enables the construction of larger models for enhanced performance. However, the increased size of these models leads to higher inference costs, making them less attractive for real-time applications. We present Fast-FORward CAching (FORA), a simple yet effective approach designed to accelerate DiT by exploiting the repetitive nature of the diffusion process. FORA implements a caching mechanism that stores and reuses intermediate outputs from the attention and MLP layers across denoising steps, thereby reducing computational overhead. This approach does not require model retraining and seamlessly integrates with existing transformer-based diffusion models. Experiments show that FORA can speed up diffusion transformers several times over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

prathebaselva/fora
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPower Transformer Diagnostics and Insulation · Magnetic Properties and Applications · Semiconductor materials and devices

MethodsSoftmax · Attention Is All You Need · Diffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings