Local Flow Matching Generative Models
Chen Xu, Xiuyuan Cheng, Yao Xie

TL;DR
The paper introduces Local Flow Matching (LFM), a stepwise flow model that improves training efficiency and generative performance by learning a sequence of sub-models, each closer to the data distribution, with theoretical guarantees.
Contribution
LFM extends flow matching by decomposing the process into smaller steps, enabling faster training, theoretical guarantees, and effective distillation for improved generation.
Findings
LFM achieves faster training compared to traditional flow matching.
LFM demonstrates competitive generative quality on tabular, image, and robotic data.
Theoretical proof of generation guarantee based on $ ext{chi}^2$-divergence and diffusion contraction.
Abstract
Flow Matching (FM) is a simulation-free method for learning a continuous and invertible flow to interpolate between two distributions, and in particular to generate data from noise. Inspired by the variational nature of the diffusion process as a gradient flow, we introduce a stepwise FM model called Local Flow Matching (LFM), which consecutively learns a sequence of FM sub-models, each matching a diffusion process up to the time of the step size in the data-to-noise direction. In each step, the two distributions to be interpolated by the sub-flow model are closer to each other than data vs. noise, and this enables the use of smaller models with faster training. This variational perspective also allows us to theoretically prove a generation guarantee of the proposed flow model in terms of the -divergence between the generated and true data distributions, utilizing the…
Peer Reviews
Decision·Submitted to ICLR 2025
The authors proposed the Local Flow Matching (LFM), which divides the Flow Matching into a sequence of local parts. The approach is intuitively correct. The authors provided theoretical analysis of the convergence of the forward process, showing that the $\chi^2$ divergence between the noise distribution and noised real data distribution decreases exponentially w.r.t. the number of sub-flows. Furthermore, the authors presented the theoretical results showing the $\chi^2$ divergence between the r
From the experiments, it turns out that the proposed LFM is not working as well as the original Flow Matching (FM) (Lipman et al., 2023). The proposed method achieved an FID of 8.45 on the CIFAR-10 dataset whereas the FM achieved a lower FID of 6.35 on the same dataset. On ImageNet 32x32 dataset, the proposed method achieved an FID of 7.0, but the FM achieved a lower FID of 5.02. The authors should compare with FM on the CIFAR-10, ImageNet 32x32, the Oxford Flowers and the LSUN Church datasets.
The paper explores integrating flow-matching submodels into diffusion processes to enable faster and more efficient learning and inference. The approach is novel and is supported by theoretical guarantees of generation. Experimentally, the method can be applied to various tasks, including image generation, tabular data generation, and robotic manipulation.
1. More details should be provided, such as the number of function evaluations (NFEs) and the used ODE sampler for all methods in Table 1 and Table 2, to better demonstrate LFM's effectiveness. 2. Although the tasks are diverse, the paper lacks a solid comparison with prior methods on some fundamental tasks. For instance, comparing LFM with Rectified Flow and OU diffusion on CIFAR-10 with the same amount of batches would offer a clearer understanding of LFM's training efficiency and distillatio
1. The paper offers a fresh approach in the field of generative modeling, combining ideas from diffusion processes and FM. The idea of breaking down a single large flow into several smaller steps (local flows) is natural. 2. The paper provides solid theoretical guarantees, specifically proving a generation guarantee in terms of the $\chi^2$-divergence between the generated and true data distributions. The experiments are comprehensive, covering a range of tasks from image generation to robotic m
1. FM simplifies the diffusion process into a single step, transforming a trajectory from a curve into a straight line, thus reducing training costs and improving sampling efficiency. However, this paper reverses that advantage by breaking this single step into multiple segments. The division of the trajectory into multiple steps could introduce added computational complexity, potentially negating the efficiency gains that FM originally aimed to provide. 2. The paper presents some conceptual am
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Simulation Techniques and Applications · Reservoir Engineering and Simulation Methods
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion
