FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation

Dong Liu; Yanxuan Yu; Jiayi Zhang; Yifan Li; Ben Lengerich; Ying Nian Wu

arXiv:2505.20353·cs.LG·March 30, 2026

FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation

Dong Liu, Yanxuan Yu, Jiayi Zhang, Yifan Li, Ben Lengerich, Ying Nian Wu

PDF

1 Repo

TL;DR

FastCache significantly accelerates diffusion transformer inference by using learnable linear approximation, token selection, and caching strategies to reduce computation while maintaining high generation quality.

Contribution

It introduces a novel caching and compression framework with learnable approximation for diffusion transformers, improving speed and efficiency.

Findings

01

Substantial latency and memory reduction demonstrated across multiple DiT variants.

02

FastCache achieves the best generation quality among existing cache methods, measured by FID and t-FID.

03

Theoretical analysis confirms bounded approximation error under certain decision rules.

Abstract

Diffusion Transformers (DiT) are powerful generative models but remain computationally intensive due to their iterative structure and deep transformer stacks. To alleviate this inefficiency, we propose \textbf{FastCache}, a hidden-state-level caching and compression framework that accelerates DiT inference by exploiting redundancy within the model's internal representations. FastCache introduces a dual strategy: (1) a spatial-aware token selection mechanism that adaptively filters redundant tokens based on hidden-state saliency, and (2) a transformer-level cache that reuses latent activations across timesteps when changes fall below a predefined threshold. These modules work jointly to reduce unnecessary computation while preserving generation fidelity through learnable linear approximation. Theoretical analysis shows that FastCache maintains bounded approximation error under a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

NoakLiu/FastCache-xDiT
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.