Scores as Actions: a framework of fine-tuning diffusion models by   continuous-time reinforcement learning

Hanyang Zhao; Haoxian Chen; Ji Zhang; David D. Yao; Wenpin Tang

arXiv:2409.08400·cs.LG·September 16, 2024

Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learning

Hanyang Zhao, Haoxian Chen, Ji Zhang, David D. Yao, Wenpin Tang

PDF

Open Access

TL;DR

This paper introduces a novel framework that treats the fine-tuning of diffusion models using reinforcement learning in a continuous-time setting, leveraging human feedback to improve generative quality.

Contribution

It formulates diffusion model fine-tuning as a continuous-time stochastic control problem, unifying score-matching functions as control actions within an RL framework.

Findings

01

Framework effectively integrates RL with diffusion models

02

Theoretical development of continuous-time RL for diffusion models

03

Improved generation quality demonstrated in experiments

Abstract

Reinforcement Learning from human feedback (RLHF) has been shown a promising direction for aligning generative models with human intent and has also been explored in recent works for alignment of diffusion generative models. In this work, we provide a rigorous treatment by formulating the task of fine-tuning diffusion models, with reward functions learned from human feedback, as an exploratory continuous-time stochastic control problem. Our key idea lies in treating the score-matching functions as controls/actions, and upon this, we develop a unified framework from a continuous-time perspective, to employ reinforcement learning (RL) algorithms in terms of improving the generation quality of diffusion models. We also develop the corresponding continuous-time RL theory for policy optimization and regularization under assumptions of stochastic different equations driven environment.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsDiffusion