How Do Flow Matching Models Memorize and Generalize in Sample Data Subspaces?
Weiguo Gao, Ming Li

TL;DR
This paper provides theoretical insights into how Flow Matching models memorize and generalize within the sample data subspace, introducing a decomposition network to enhance understanding of their behavior.
Contribution
It introduces a theoretical framework for analyzing Flow Matching models' memorization and generalization, and proposes OSDNet to decompose velocity fields for better sample synthesis.
Findings
Generated samples memorize real data points.
Off-subspace component decays during training.
Samples preserve proximity and diversity within the data subspace.
Abstract
Real-world data is often assumed to lie within a low-dimensional structure embedded in high-dimensional space. In practical settings, we observe only a finite set of samples, forming what we refer to as the sample data subspace. It serves an essential approximation supporting tasks such as dimensionality reduction and generation. A major challenge lies in whether generative models can reliably synthesize samples that stay within this subspace rather than drifting away from the underlying structure. In this work, we provide theoretical insights into this challenge by leveraging Flow Matching models, which transform a simple prior into a complex target distribution via a learned velocity field. By treating the real data distribution as discrete, we derive analytical expressions for the optimal velocity field under a Gaussian prior, showing that generated samples memorize real data points…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Time Series Analysis and Forecasting · Traffic Prediction and Management Techniques
MethodsSparse Evolutionary Training
