The Generation Phases of Flow Matching: a Denoising Perspective

Anne Gagneux; S\'egol\`ene Martin; R\'emi Gribonval; Mathurin Massias

arXiv:2510.24830·cs.CV·December 22, 2025

The Generation Phases of Flow Matching: a Denoising Perspective

Anne Gagneux, S\'egol\`ene Martin, R\'emi Gribonval, Mathurin Massias

PDF

Open Access 3 Reviews

TL;DR

This paper explores the generation process of flow matching models from a denoising perspective, revealing distinct phases and factors influencing sample quality, and providing a framework for controlled perturbations to improve generation.

Contribution

It establishes formal connections between flow matching and denoisers, enabling analysis of generation phases and influencing factors for improved understanding and control.

Findings

01

Identifies distinct dynamical phases in flow matching generation

02

Provides a framework to analyze denoiser success and failure stages

03

Suggests principled perturbations to enhance sample quality

Abstract

Flow matching has achieved remarkable success, yet the factors influencing the quality of its generation process remain poorly understood. In this work, we adopt a denoising perspective and design a framework to empirically probe the generation process. Laying down the formal connections between flow matching models and denoisers, we provide a common ground to compare their performances on generation and denoising. This enables the design of principled and controlled perturbations to influence sample generation: noise and drift. This leads to new insights on the distinct dynamical phases of the generative process, enabling us to precisely characterize at which stage of the generative process denoisers succeed or fail and why this matters.

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 2

Strengths

- The paper cleanly derives the MMSE denoiser (velocity identity) and uses it to recast FM training as weighted denoising, unifying multiple losses (FM, classical, unweighted) in one framework. This makes design choices (weights, parameterizations) explicit and comparable. - It shows new insight on temporal regimes. By comparing Jacobian spectral norms, the paper shows the closed-form target has an early Lipschitz peak (at trajectory “splitting”), which trained models smooth out - Residual pa

Weaknesses

- Most results use small image benchmarks (CIFAR-10, CelebA-64), a single ODE solver (dopri5, 100 steps), and closely related U-Net-style architectures with EMA; this limits claims about “phases” to low-resolution image FM under specific integration and training regimes. Stronger evidence would test (higher resolutions and diverse datasets (e.g., ImageNet-256/512), other samplers/step counts (explicit compute-vs-quality curves), and other architectures (non-U-Net backbones, transformer variants)

Reviewer 02Rating 2Confidence 3

Strengths

- The idea of constructing of the "denoising toolkit" can be fruitful - The analysis of perturbations in Section 5 is one of the most interesting points. The distinction between drift-type and noise-type perturbations and their dependence on time is an interesting observation that contributes to a better understanding of Flow Matching dynamics. The statement that similar FID indicators can have different generative behaviors is important.

Weaknesses

- The absence of theoretical justification or any intuition that would lead to an understanding of the numerical results presented. While the empirical work is extensive, the paper falls short of providing a satisfying explanation for its most critical observations: * Why does the FM loss weighting $(\frac1{1-t})^2$, which emphasizes easy (low-noise) denoising tasks, yield the best generative models? This is counter-intuitive and demands a deeper hypothesis beyond its empirical success.

Reviewer 03Rating 2Confidence 4

Strengths

The paper studies the generalization ability of flow matching models, which is an important problem.

Weaknesses

1. Though this is an empirical paper, the experiments are not well conducted. For example, the FID protocol for evaluating on 10k test images is not standard, and the model appears undertrained, as the best-reported CIFAR-10 FID (9.44) is far from the FM baseline (2.99; Lipman et al) . With such poor absolute quality, differences across variants may reflect training deficit rather than principled phase behavior. 2. The experiments are not well designed. The 10-denoisers experiment in Section 4.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBusiness Process Modeling and Analysis · Data Stream Mining Techniques · Innovative Microfluidic and Catalytic Techniques Innovation