Better Source, Better Flow: Learning Condition-Dependent Source Distribution for Flow Matching
Junwan Kim, Jiho Park, Seonghu Jeon, Seungryong Kim

TL;DR
This paper introduces a method to learn condition-dependent source distributions for flow matching in text-to-image generation, leading to faster convergence and improved quality by optimizing the source distribution.
Contribution
It proposes a novel approach to design and learn source distributions conditioned on input, addressing stability issues and enhancing flow matching performance in generative models.
Findings
Up to 3x faster convergence in FID scores.
Improved stability with variance regularization.
Enhanced performance across multiple benchmarks.
Abstract
Flow matching has recently emerged as a promising alternative to diffusion-based generative models, particularly for text-to-image generation. Despite its flexibility in allowing arbitrary source distributions, most existing approaches rely on a standard Gaussian distribution, a choice inherited from diffusion models, and rarely consider the source distribution itself as an optimization target in such settings. In this work, we show that principled design of the source distribution is not only feasible but also beneficial at the scale of modern text-to-image systems. Specifically, we propose learning a condition-dependent source distribution under flow matching objective that better exploit rich conditioning signals. We identify key failure modes that arise when directly incorporating conditioning into the source, including distributional collapse and instability, and show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
