LayerCache: Exploiting Layer-wise Velocity Heterogeneity for Efficient Flow Matching Inference

Guandong Li

arXiv:2604.16492·cs.CV·April 21, 2026

LayerCache: Exploiting Layer-wise Velocity Heterogeneity for Efficient Flow Matching Inference

Guandong Li

PDF

TL;DR

LayerCache introduces a layer-wise caching strategy for flow matching models, exploiting heterogeneity in layer dynamics to significantly improve inference efficiency and image quality.

Contribution

It proposes a novel layer-aware caching framework with adaptive scheduling, outperforming prior methods by exploiting layer heterogeneity in Transformer-based models.

Findings

01

Achieves 1.37x speedup with improved image quality metrics.

02

Reduces LPIPS by 70% compared to prior caching methods.

03

Outperforms all prior caching methods on the quality-speed Pareto frontier.

Abstract

Flow Matching models achieve state-of-the-art image generation quality but incur substantial inference cost due to iterative denoising through large Transformer networks. We observe that different layer groups within a Transformer exhibit markedly heterogeneous velocity dynamics: shallow layers are highly stable and amenable to aggressive caching, while deep layers undergo large velocity changes that demand full computation. Existing caching methods, however, treat the entire Transformer as a monolithic unit, applying a single caching decision per timestep and thus failing to exploit this heterogeneity. Based on this finding, we propose LayerCache, a layer-aware caching framework that partitions the Transformer into layer groups and makes independent, per-group caching decisions at each denoising step. LayerCache introduces an adaptive JVP span K selection mechanism that leverages…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.