Alignment of Diffusion Model and Flow Matching for Text-to-Image Generation

Yidong Ouyang; Liyan Xie; Hongyuan Zha; Guang Cheng

arXiv:2602.00413·stat.ML·February 3, 2026

Alignment of Diffusion Model and Flow Matching for Text-to-Image Generation

Yidong Ouyang, Liyan Xie, Hongyuan Zha, Guang Cheng

PDF

Open Access

TL;DR

This paper introduces a novel, computationally efficient framework for aligning diffusion and flow matching models in text-to-image generation by leveraging reward-weighted distributions, avoiding extensive fine-tuning.

Contribution

It proposes a finetuning-free guidance network for diffusion models and a training-free approach for flow matching, enhancing generation quality with reduced computational costs.

Findings

01

Achieves comparable performance to finetuning methods with 60% less computation.

02

Introduces a guidance network for diffusion models that reduces artifacts.

03

Provides a training-free method for flow matching that improves image quality.

Abstract

Diffusion models and flow matching have demonstrated remarkable success in text-to-image generation. While many existing alignment methods primarily focus on fine-tuning pre-trained generative models to maximize a given reward function, these approaches require extensive computational resources and may not generalize well across different objectives. In this work, we propose a novel alignment framework by leveraging the underlying nature of the alignment problem -- sampling from reward-weighted distributions -- and show that it applies to both diffusion models (via score guidance) and flow matching models (via velocity guidance). The score function (velocity field) required for the reward-weighted distribution can be decomposed into the pre-trained score (velocity field) plus a conditional expectation of the reward. For the alignment on the diffusion model, we identify a fundamental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Digital Humanities and Scholarship