TL;DR
This paper introduces L2P, a data-driven linear predictor that significantly accelerates diffusion models by replacing fixed formulas with learnable weights, achieving substantial speedups with maintained quality.
Contribution
L2P is a simple, fast-to-train, learnable linear predictor framework that improves diffusion model inference efficiency over fixed formula methods.
Findings
L2P achieves 4.55x FLOPs reduction on FLUX.1-dev.
L2P maintains high visual fidelity under 7.18x acceleration on Qwen-Image.
L2P outperforms existing caching baselines in speed and quality.
Abstract
To address the high sampling cost of Diffusion Transformers (DiTs), feature caching offers a training-free acceleration method. However, existing methods rely on hand-crafted forecasting formulas that fail under aggressive skipping. We propose L2P (Learnable Linear Predictor), a simple data-driven caching framework that replaces fixed coefficients with learnable per-timestep weights. Rapidly trained in ~20 seconds on a single GPU, L2P accurately reconstructs current features from past trajectories. L2P significantly outperforms existing baselines: it achieves a 4.55x FLOPs reduction and 4.15x latency speedup on FLUX.1-dev, and maintains high visual fidelity under up to 7.18x acceleration on Qwen-Image models, where prior methods show noticeable quality degradation. Our results show learning linear predictors is highly effective for efficient DiT inference. Code is available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
