Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learning
Hanyang Zhao, Haoxian Chen, Ji Zhang, David D. Yao, Wenpin Tang

TL;DR
This paper introduces a novel framework that treats the fine-tuning of diffusion models using reinforcement learning in a continuous-time setting, leveraging human feedback to improve generative quality.
Contribution
It formulates diffusion model fine-tuning as a continuous-time stochastic control problem, unifying score-matching functions as control actions within an RL framework.
Findings
Framework effectively integrates RL with diffusion models
Theoretical development of continuous-time RL for diffusion models
Improved generation quality demonstrated in experiments
Abstract
Reinforcement Learning from human feedback (RLHF) has been shown a promising direction for aligning generative models with human intent and has also been explored in recent works for alignment of diffusion generative models. In this work, we provide a rigorous treatment by formulating the task of fine-tuning diffusion models, with reward functions learned from human feedback, as an exploratory continuous-time stochastic control problem. Our key idea lies in treating the score-matching functions as controls/actions, and upon this, we develop a unified framework from a continuous-time perspective, to employ reinforcement learning (RL) algorithms in terms of improving the generation quality of diffusion models. We also develop the corresponding continuous-time RL theory for policy optimization and regularization under assumptions of stochastic different equations driven environment.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsDiffusion
