TL;DR
This paper uncovers the two-stage training process of flow-based diffusion models, explaining their behavior and guiding future improvements through analysis of the velocity field and practical techniques.
Contribution
It reveals the inherent two-stage training dynamics of flow-based diffusion models and explains the effectiveness of common training techniques.
Findings
Early stage guided by data modes forms global layouts.
Later stage memorizes fine-grained details.
Practical techniques align with the two-stage training process.
Abstract
Flow-based diffusion models have emerged as a leading paradigm for training generative models across images and videos. However, their memorization-generalization behavior remains poorly understood. In this work, we revisit the flow matching (FM) objective and study its marginal velocity field, which admits a closed-form expression, allowing exact computation of the oracle FM target. Analyzing this oracle velocity field reveals that flow-based diffusion models inherently formulate a two-stage training target: an early stage guided by a mixture of data modes, and a later stage dominated by the nearest data sample. The two-stage objective leads to distinct learning behaviors: the early navigation stage generalizes across data modes to form global layouts, whereas the later refinement stage increasingly memorizes fine-grained details. Leveraging these insights, we explain the effectiveness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
