Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization
Jiajun Fan, Shuaike Shen, Chaoran Cheng, Yuxin Chen, Chumeng Liang, Ge, Liu

TL;DR
This paper introduces ORW-CFM-W2, a novel RL fine-tuning method for flow-based generative models that balances reward optimization and diversity using Wasserstein regularization, without requiring reward gradients.
Contribution
It presents a theoretically grounded, online reward-weighted flow matching approach with Wasserstein-2 regularization, enabling efficient, reward-aligned fine-tuning of continuous flow models.
Findings
Effective in target image generation, image compression, and text-image alignment.
Achieves optimal policy convergence with controllable reward-diversity trade-offs.
Maintains diversity and prevents policy collapse during fine-tuning.
Abstract
Recent advancements in reinforcement learning (RL) have achieved great success in fine-tuning diffusion-based generative models. However, fine-tuning continuous flow-based generative models to align with arbitrary user-defined reward functions remains challenging, particularly due to issues such as policy collapse from overoptimization and the prohibitively high computational cost of likelihoods in continuous-time flows. In this paper, we propose an easy-to-use and theoretically sound RL fine-tuning method, which we term Online Reward-Weighted Conditional Flow Matching with Wasserstein-2 Regularization (ORW-CFM-W2). Our method integrates RL into the flow matching framework to fine-tune generative models with arbitrary reward functions, without relying on gradients of rewards or filtered datasets. By introducing an online reward-weighting mechanism, our approach guides the model to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsLattice Boltzmann Simulation Studies · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
