Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning

Hanyang Zhao; Haoxian Chen; Ji Zhang; David D. Yao; Wenpin Tang

arXiv:2502.01819·cs.LG·August 25, 2025

Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning

Hanyang Zhao, Haoxian Chen, Ji Zhang, David D. Yao, Wenpin Tang

PDF

Open Access

TL;DR

This paper introduces a continuous-time reinforcement learning approach to fine-tune diffusion models, reducing discretization errors and improving alignment with input prompts, demonstrated on large-scale Text2Image models.

Contribution

It develops a novel continuous-time RL framework for diffusion model fine-tuning, connecting score matching with policy optimization and regularization.

Findings

01

Improved fine-tuning of diffusion models with continuous-time RL.

02

Enhanced value network design leveraging diffusion model structure.

03

Validated on large-scale Text2Image models.

Abstract

Reinforcement learning from human feedback (RLHF), which aligns a diffusion model with input prompt, has become a crucial step in building reliable generative AI models. Most works in this area use a discrete-time formulation, which is prone to induced discretization errors, and often not applicable to models with higher-order/black-box solvers. The objective of this study is to develop a disciplined approach to fine-tune diffusion models using continuous-time RL, formulated as a stochastic control problem with a reward function that aligns the end result (terminal state) with input prompt. The key idea is to treat score matching as controls or actions, and thereby making connections to policy optimization and regularization in continuous-time RL. To carry out this idea, we lay out a new policy optimization framework for continuous-time RL, and illustrate its potential in enhancing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Stock Market Forecasting Methods · Reinforcement Learning in Robotics

MethodsDiffusion