MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference
Huanlin Gao, Ping Chen, Fuyuan Shi, Ruijia Wu, Li YanTao, Qiang Hui, Yuren You, Ting Lu, Chao Tan, Shaoan Zhao, Zhaoxiang Liu, Fang Zhao, Kai Wang, Shiguo Lian

TL;DR
MeanCache introduces an average-velocity approach using cached Jacobian--vector products to improve flow matching inference efficiency and stability, significantly accelerating inference while maintaining high quality.
Contribution
It proposes a novel average-velocity perspective and a trajectory-stability scheduling strategy to enhance caching methods for flow matching inference.
Findings
Achieves up to 4.56X acceleration on benchmark datasets.
Outperforms state-of-the-art caching methods in generation quality.
Provides a new perspective for stability-driven acceleration in generative models.
Abstract
We present MeanCache, a training-free caching framework for efficient Flow Matching inference. Existing caching methods reduce redundant computation but typically rely on instantaneous velocity information (e.g., feature caching), which often leads to severe trajectory deviations and error accumulation under high acceleration ratios. MeanCache introduces an average-velocity perspective: by leveraging cached Jacobian--vector products (JVP) to construct interval average velocities from instantaneous velocities, it effectively mitigates local error accumulation. To further improve cache timing and JVP reuse stability, we develop a trajectory-stability scheduling strategy as a practical tool, employing a Peak-Suppressed Shortest Path under budget constraints to determine the schedule. Experiments on FLUX.1, Qwen-Image, and HunyuanVideo demonstrate that MeanCache achieves 4.12X and 4.56X and…
Peer Reviews
Decision·ICLR 2026 Poster
* MeanCache presents a conceptually simple shift (using average vs instantaneous velocity) yet this new perspective is powerful. The paper explains this insight clearly. * It achieves large acceleration on realistic large models while retaining high fidelity, outperforming prior caching schemes across image/video tasks. * The use of the MeanFlow identity and JVP to bridge instantaneous and average velocity is well-founded. * The trajectory-stability scheduling (peak-suppressed shortest path)
* The method depends on approximating a future JVP from past states. As the authors note, the choice of interval K is “critical” and this is a trade-off between error and stability. If the model’s dynamics are highly non-linear, the approximation might degrade. The paper addresses this with scheduling, but it remains an approximation-dependent approach. * There are several hyperparameters (cache span $K$, budget $\mathcal{B}$, peak-penalty $\gamma$). While the paper provides ablation studies, c
1. MeanCache consistently outperforms state-of-the-art baselines, especially at high acceleration ratios where competing methods collapse. 2. The inclusion of both perceptual metrics and reconstruction metrics provides thorough quality assessment. 3. The paper is very well-written, clearly motivating the problem and lucidly explaining the proposed methodology.
1. The paper lacks theoretical analysis (e.g., error bounds) explaining why JVP-based average velocity outperforms TaylorSeer's Taylor expansion, leaving the source of empirical gains unclear. 2. Constructing the multigraph and computing shortest paths incurs preprocessing cost. Table 1-2 report only inference latency, not total time including preprocessing. 3. TaylorSeer encounters OOM on HunyuanVideo, forcing CPU-offload for all methods. This may artificially inflate latency measurements and d
+ Clear, practical idea: Average velocity caching with a JVP estimator is simple yet impactful; it addresses the well known error accumulation issue at high acceleration ratios. + Scheduling formalization: Casting cache placement as a constrained shortest path problem is principled and provides a tunable trade off via $\gamma$ and budget $B$. + Strong empirical results: Consistent improvements over strong baselines across image and video; qualitative figures align with quantitative gains.
- Heuristic JVP reuse: Approximating $JVP_{t\to s}$ by $JVP_{r\to t}$ is empirically motivated. The paper does not provide theoretical guarantees or error bounds beyond the MeanFlow identities. Performance depends on the span $K$, which is tuned and schedule dependent. - Stability assumption: The scheduling assumes that “relative changes at fixed timesteps are highly consistent across samples.” This is plausible but not statistically substantiated. Some quantification (e.g., cross prompt correla
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation
