Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein   Regularization

Jiajun Fan; Shuaike Shen; Chaoran Cheng; Yuxin Chen; Chumeng Liang; Ge; Liu

arXiv:2502.06061·cs.LG·February 11, 2025

Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization

Jiajun Fan, Shuaike Shen, Chaoran Cheng, Yuxin Chen, Chumeng Liang, Ge, Liu

PDF

Open Access 1 Video

TL;DR

This paper introduces ORW-CFM-W2, a novel RL fine-tuning method for flow-based generative models that balances reward optimization and diversity using Wasserstein regularization, without requiring reward gradients.

Contribution

It presents a theoretically grounded, online reward-weighted flow matching approach with Wasserstein-2 regularization, enabling efficient, reward-aligned fine-tuning of continuous flow models.

Findings

01

Effective in target image generation, image compression, and text-image alignment.

02

Achieves optimal policy convergence with controllable reward-diversity trade-offs.

03

Maintains diversity and prevents policy collapse during fine-tuning.

Abstract

Recent advancements in reinforcement learning (RL) have achieved great success in fine-tuning diffusion-based generative models. However, fine-tuning continuous flow-based generative models to align with arbitrary user-defined reward functions remains challenging, particularly due to issues such as policy collapse from overoptimization and the prohibitively high computational cost of likelihoods in continuous-time flows. In this paper, we propose an easy-to-use and theoretically sound RL fine-tuning method, which we term Online Reward-Weighted Conditional Flow Matching with Wasserstein-2 Regularization (ORW-CFM-W2). Our method integrates RL into the flow matching framework to fine-tune generative models with arbitrary reward functions, without relying on gradients of rewards or filtered datasets. By introducing an online reward-weighting mechanism, our approach guides the model to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization· slideslive

Taxonomy

TopicsLattice Boltzmann Simulation Studies · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications