PRDP: Proximal Reward Difference Prediction for Large-Scale Reward   Finetuning of Diffusion Models

Fei Deng; Qifei Wang; Wei Wei; Matthias Grundmann; Tingbo Hou

arXiv:2402.08714·cs.LG·March 29, 2024·2 cites

PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models

Fei Deng, Qifei Wang, Wei Wei, Matthias Grundmann, Tingbo Hou

PDF

Open Access

TL;DR

PRDP introduces a stable, supervised reward difference prediction method for large-scale reward finetuning of diffusion models, outperforming RL-based methods on complex, unseen prompts in vision tasks.

Contribution

The paper proposes PRDP, a novel reward difference prediction approach that stabilizes large-scale reward finetuning of diffusion models, enabling better generalization to complex prompts.

Findings

01

PRDP matches RL methods in small-scale reward maximization.

02

PRDP outperforms RL in large-scale training on unseen prompts.

03

PRDP achieves higher quality image generation on diverse prompts.

Abstract

Reward finetuning has emerged as a promising approach to aligning foundation models with downstream objectives. Remarkable success has been achieved in the language domain by using reinforcement learning (RL) to maximize rewards that reflect human preference. However, in the vision domain, existing RL-based reward finetuning methods are limited by their instability in large-scale training, rendering them incapable of generalizing to complex, unseen prompts. In this paper, we propose Proximal Reward Difference Prediction (PRDP), enabling stable black-box reward finetuning for diffusion models for the first time on large-scale prompt datasets with over 100K prompts. Our key innovation is the Reward Difference Prediction (RDP) objective that has the same optimal solution as the RL objective while enjoying better training stability. Specifically, the RDP objective is a supervised regression…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsSparse Evolutionary Training · Diffusion