Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement
Hui Yuan, Kaixuan Huang, Chengzhuo Ni, Minshuo Chen, Mengdi Wang

TL;DR
This paper introduces a reward-directed conditional diffusion model that effectively learns reward-conditioned data distributions and improves reward quality, supported by theoretical analysis and empirical validation.
Contribution
It presents a novel diffusion-based method for reward-directed generation with provable distribution learning and reward improvement capabilities.
Findings
The model can learn and sample from reward-conditioned distributions.
It recovers the latent data subspace effectively.
Reward improvements depend on reward signal strength and distribution shift.
Abstract
We explore the methodology and theory of reward-directed generation via conditional diffusion models. Directed generation aims to generate samples with desired properties as measured by a reward function, which has broad applications in generative AI, reinforcement learning, and computational biology. We consider the common learning scenario where the data set consists of unlabeled data along with a smaller set of data with noisy reward labels. Our approach leverages a learned reward function on the smaller data set as a pseudolabeler. From a theoretical standpoint, we show that this directed generator can effectively learn and sample from the reward-conditioned data distribution. Additionally, our model is capable of recovering the latent subspace representation of data. Moreover, we establish that the model generates a new population that moves closer to a user-specified target reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsModel Reduction and Neural Networks · Gaussian Processes and Bayesian Inference
MethodsDiffusion
