Reward-Directed Conditional Diffusion: Provable Distribution Estimation   and Reward Improvement

Hui Yuan; Kaixuan Huang; Chengzhuo Ni; Minshuo Chen; Mengdi Wang

arXiv:2307.07055·cs.LG·July 17, 2023·2 cites

Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement

Hui Yuan, Kaixuan Huang, Chengzhuo Ni, Minshuo Chen, Mengdi Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a reward-directed conditional diffusion model that effectively learns reward-conditioned data distributions and improves reward quality, supported by theoretical analysis and empirical validation.

Contribution

It presents a novel diffusion-based method for reward-directed generation with provable distribution learning and reward improvement capabilities.

Findings

01

The model can learn and sample from reward-conditioned distributions.

02

It recovers the latent data subspace effectively.

03

Reward improvements depend on reward signal strength and distribution shift.

Abstract

We explore the methodology and theory of reward-directed generation via conditional diffusion models. Directed generation aims to generate samples with desired properties as measured by a reward function, which has broad applications in generative AI, reinforcement learning, and computational biology. We consider the common learning scenario where the data set consists of unlabeled data along with a smaller set of data with noisy reward labels. Our approach leverages a learned reward function on the smaller data set as a pseudolabeler. From a theoretical standpoint, we show that this directed generator can effectively learn and sample from the reward-conditioned data distribution. Additionally, our model is capable of recovering the latent subspace representation of data. Moreover, we establish that the model generates a new population that moves closer to a user-specified target reward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kaffaljidhmah2/rcgdm
pytorch

Videos

Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement· slideslive

Taxonomy

TopicsModel Reduction and Neural Networks · Gaussian Processes and Bayesian Inference

MethodsDiffusion