FlowCast: Trajectory Forecasting for Scalable Zero-Cost Speculative Flow Matching

Divya Jyoti Bajpai; Shubham Agarwal; Apoorv Saxena; Kuldeep Kulkarni; Subrata Mitra; Manjesh Kumar Hanawal

arXiv:2602.01329·cs.CV·February 3, 2026

FlowCast: Trajectory Forecasting for Scalable Zero-Cost Speculative Flow Matching

Divya Jyoti Bajpai, Shubham Agarwal, Apoorv Saxena, Kuldeep Kulkarni, Subrata Mitra, Manjesh Kumar Hanawal

PDF

Open Access 3 Reviews

TL;DR

FlowCast is a training-free framework that accelerates trajectory forecasting in visual generation by speculating future velocities, enabling over 2.5x faster inference without quality loss.

Contribution

FlowCast introduces a novel speculative generation method leveraging constant velocity extrapolation, eliminating the need for retraining or auxiliary networks.

Findings

01

Achieves over 2.5x speedup in image and video generation tasks.

02

Maintains comparable quality to full trajectory matching methods.

03

Provides theoretical bounds on trajectory deviation.

Abstract

Flow Matching (FM) has recently emerged as a powerful approach for high-quality visual generation. However, their prohibitively slow inference due to a large number of denoising steps limits their potential use in real-time or interactive applications. Existing acceleration methods, like distillation, truncation, or consistency training, either degrade quality, incur costly retraining, or lack generalization. We propose FlowCast, a training-free speculative generation framework that accelerates inference by exploiting the fact that FM models are trained to preserve constant velocity. FlowCast speculates future velocity by extrapolating current velocity without incurring additional time cost, and accepts it if it is within a mean-squared error threshold. This constant-velocity forecasting allows redundant steps in stable regions to be aggressively skipped while retaining precision in…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 3

Strengths

- The proposed method is training-free and model-agnostic. - The key insight is good. The method pinpoints and exploits a core property of FM—local smoothness and near-constant velocities—turning the model’s own current velocity into a “zero-cost” draft and verifying in velocity space. This aligns perfectly with FM’s dynamics, enabling aggressive step skipping in stable regions while preserving fidelity, and it explains both the simplicity and effectiveness of the approach. - Substantial inferen

Weaknesses

- In Table 1, provide profiling of memory footprint and latency of speculative vs. existing methods. - Discuss the failure case or limitation of the proposed method. - The manuscript organization can be improved. Such as the Eq.5 in L244. - A more comprehensive comparison of the proposed method in term of computation complexity, memory usage and so are expected.

Reviewer 02Rating 6Confidence 4

Strengths

+ FlowCast's core mechanism uses the model's own previous velocity prediction as a draft, incurring zero additional training or computational overhead for the drafting phase. + The framework integrates seamlessly with any existing FM model and is successfully demonstrated across image generation, image editing, multi-turn editing, and video generation tasks, showcasing its broad applicability. + FlowCast dynamically adapts the number of steps skipped based on the local complexity of the genera

Weaknesses

The main practical limitation acknowledged in the conclusion is FlowCast's dependence on adequate compute for parallel verification of drafts, where cutting drafts reduces overhead but also limits speedup. While the verification step is performed in a single forward pass for parallel drafts, the overall efficiency gain is bounded by the available parallelism. While the paper provides empirical guidance for setting the threshold $\epsilon$ for different tasks (e.g., $\epsilon \in [0.01, 0.02]$ fo

Reviewer 03Rating 2Confidence 2

Strengths

- the paper is well-written and easy to follow - the method is training-free and plug-and-play

Weaknesses

- the motivation to predict future velocity based on known ones is not fully justified. In few-step sampling the velocity actually varies a lot in adjacent steps. - experiments on text-to-image generation is not very comprehensive, only some cherry-picked simple cases is shown - the performance drop (e.g., from 0.78 to 0.57) is unacceptable. In image generation, we would prefer high-quality instead of fast-speed.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Image Enhancement Techniques