Terminal Velocity Matching

Linqi Zhou; Mathias Parger; Ayaan Haque; Jiaming Song

arXiv:2511.19797·cs.LG·February 18, 2026

Terminal Velocity Matching

Linqi Zhou, Mathias Parger, Ayaan Haque, Jiaming Song

PDF

Open Access 1 Models 3 Reviews

TL;DR

Terminal Velocity Matching (TVM) is a novel generative modeling approach that improves efficiency and fidelity by modeling transitions between diffusion steps and regularizing at the terminal time, achieving state-of-the-art results.

Contribution

TVM introduces a new framework for high-fidelity one- and few-step generative modeling, with architectural modifications for stability and a fused attention kernel for efficiency.

Findings

01

Achieves 3.29 FID with 1 NFE on ImageNet-256x256

02

Achieves 1.99 FID with 4 NFEs on ImageNet-256x256

03

State-of-the-art performance for one/few-step models from scratch

Abstract

We propose Terminal Velocity Matching (TVM), a generalization of flow matching that enables high-fidelity one- and few-step generative modeling. TVM models the transition between any two diffusion timesteps and regularizes its behavior at its terminal time rather than at the initial time. We prove that TVM provides an upper bound on the $2$ -Wasserstein distance between data and model distributions when the model is Lipschitz continuous. However, since Diffusion Transformers lack this property, we introduce minimal architectural changes that achieve stable, single-stage training. To make TVM efficient in practice, we develop a fused attention kernel that supports backward passes on Jacobian-Vector Products, which scale well with transformer architectures. On ImageNet-256x256, TVM achieves 3.29 FID with a single function evaluation (NFE) and 1.99 FID with 4 NFEs. It similarly achieves…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 3

Strengths

1. TVM reframes the problem of learning long-horizon ODE jumps as a terminal velocity condition (Eq. 6–7), providing a clean theoretical link between displacement error and velocity matching. 2. Theorem 1 establishes a distribution-level guarantee ($W_2$ upper bound) without requiring multiple particles (unlike IMM). 3. Duality with MeanFlow is clearly articulated: The paper shows MeanFlow matches initial velocity while TVM matches terminal velocity (Appendix E.1), offering a compelling symmetry

Weaknesses

1. While inference is fast, training cost (FLOPs, GPU-hours) vs. MeanFlow, sCT, or IMM is not reported.

Reviewer 02Rating 4Confidence 4

Strengths

1. The theoretical formulation is elegant and well-motivated, linking Flow Matching to a Wasserstein upper bound through terminal velocity constraints. 2. The method achieves excellent efficiency–quality trade-offs, outperforming existing one-step and few-step baselines (e.g., Consistency Models, MeanFlow) on ImageNet-256. 3. The architectural refinements (semi-Lipschitz normalization, FlashAttention JVP) are practically valuable contributions that could generalize to other diffusion or flow-bas

Weaknesses

1. While theoretically grounded, the intuition behind “terminal velocity” could be elaborated further—especially how it differs in practice from midpoint or integral matching. 2. The scope of evaluation is limited to class-conditional ImageNet-256. Demonstrating robustness on higher-resolution or unconditional datasets (e.g., ImageNet-512, COCO) would strengthen generality claims. 3. The paper relies on a single architecture (DiT-XL/2). It is unclear whether TVM’s benefits extend to U-Net–based

Reviewer 03Rating 6Confidence 5

Strengths

1. **Boundary condition at terminal time** By enforcing velocity matching at the terminal rather than initial timestep, the method avoids evaluating JVPs involving guided velocities that often exhibit large norms and high variance during training. This could be more beneficial for stabilizing training when scaling to larger dimensionality, where the guided velocities often exhibit even larger norms and higher variance. 2. **Simple and stable training recipe** The paper achieves stable one-sta

Weaknesses

1. **Backpropagation through JVP.** Unlike prior continuous-time consistency models where JVP terms are detached from the gradient graph (sCM, MeanFlow, etc.), TVM explicitly backpropagates through the JVP term introducing additional computational cost. This could become prohibitive for large-scale models. Providing quantitative analysis (e.g., runtime, memory, or gradient-cost overhead relative to MeanFlow) would help stress the concern. 2. **Insufficient ablations.** While the design choice

Code & Models

Models

🤗
lumaai/tvm
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Model Reduction and Neural Networks