VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL

Fengyuan Dai; Zifeng Zhuang; Yufei Huang; Siteng Huang; Bangyan Liao; Donglin Wang; Fajie Yuan

arXiv:2505.15791·cs.CV·June 3, 2025

VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL

Fengyuan Dai, Zifeng Zhuang, Yufei Huang, Siteng Huang, Bangyan Liao, Donglin Wang, Fajie Yuan

PDF

Open Access

TL;DR

VARD introduces a value-based reinforcement learning method that provides dense, differentiable supervision for fine-tuning diffusion models, resulting in more stable training and improved generation quality for complex tasks.

Contribution

It proposes a novel value function-based approach with KL regularization to enable efficient, stable, and reward-aligned fine-tuning of diffusion models.

Findings

01

Enhanced trajectory guidance in diffusion models

02

Improved training efficiency and stability

03

Extended RL applicability to complex, non-differentiable rewards

Abstract

Diffusion models have emerged as powerful generative tools across various domains, yet tailoring pre-trained models to exhibit specific desirable properties remains challenging. While reinforcement learning (RL) offers a promising solution,current methods struggle to simultaneously achieve stable, efficient fine-tuning and support non-differentiable rewards. Furthermore, their reliance on sparse rewards provides inadequate supervision during intermediate steps, often resulting in suboptimal generation quality. To address these limitations, dense and differentiable signals are required throughout the diffusion process. Hence, we propose VAlue-based Reinforced Diffusion (VARD): a novel approach that first learns a value function predicting expection of rewards from intermediate states, and subsequently uses this value function with KL regularization to provide dense supervision throughout…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMatrix Theory and Algorithms · Numerical methods for differential equations · Model Reduction and Neural Networks

MethodsDiffusion