Alignment of Diffusion Model and Flow Matching for Text-to-Image Generation
Yidong Ouyang, Liyan Xie, Hongyuan Zha, Guang Cheng

TL;DR
This paper introduces a novel, computationally efficient framework for aligning diffusion and flow matching models in text-to-image generation by leveraging reward-weighted distributions, avoiding extensive fine-tuning.
Contribution
It proposes a finetuning-free guidance network for diffusion models and a training-free approach for flow matching, enhancing generation quality with reduced computational costs.
Findings
Achieves comparable performance to finetuning methods with 60% less computation.
Introduces a guidance network for diffusion models that reduces artifacts.
Provides a training-free method for flow matching that improves image quality.
Abstract
Diffusion models and flow matching have demonstrated remarkable success in text-to-image generation. While many existing alignment methods primarily focus on fine-tuning pre-trained generative models to maximize a given reward function, these approaches require extensive computational resources and may not generalize well across different objectives. In this work, we propose a novel alignment framework by leveraging the underlying nature of the alignment problem -- sampling from reward-weighted distributions -- and show that it applies to both diffusion models (via score guidance) and flow matching models (via velocity guidance). The score function (velocity field) required for the reward-weighted distribution can be decomposed into the pre-trained score (velocity field) plus a conditional expectation of the reward. For the alignment on the diffusion model, we identify a fundamental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Digital Humanities and Scholarship
