Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized   Control

Masatoshi Uehara; Yulai Zhao; Kevin Black; Ehsan Hajiramezanali,; Gabriele Scalia; Nathaniel Lee Diamant; Alex M Tseng; Tommaso Biancalani,; Sergey Levine

arXiv:2402.15194·cs.LG·February 29, 2024·1 cites

Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control

Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali,, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Tommaso Biancalani,, Sergey Levine

PDF

Open Access

TL;DR

This paper introduces a method to fine-tune diffusion models using entropy-regularized control, enabling goal-directed generation that maintains diversity and mitigates reward collapse, especially when using imperfect reward functions.

Contribution

It proposes a novel entropy-regularized control framework for diffusion models, improving goal-directed sample generation while preserving diversity and robustness against reward model imperfections.

Findings

01

Efficiently generates diverse, high-reward samples.

02

Mitigates reward collapse in goal-directed diffusion model fine-tuning.

03

Theoretically and empirically demonstrates robustness against imperfect rewards.

Abstract

Diffusion models excel at capturing complex data distributions, such as those of natural images and proteins. While diffusion models are trained to represent the distribution in the training dataset, we often are more concerned with other properties, such as the aesthetic quality of the generated images or the functional properties of generated proteins. Diffusion models can be finetuned in a goal-directed way by maximizing the value of some reward function (e.g., the aesthetic quality of an image). However, these approaches may lead to reduced sample diversity, significant deviations from the training data distribution, and even poor sample quality due to the exploitation of an imperfect reward function. The last issue often occurs when the reward function is a learned model meant to approximate a ground-truth "genuine" reward, as is the case in many practical applications. These…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical methods in inverse problems · Advanced Mathematical Modeling in Engineering · Model Reduction and Neural Networks

MethodsDiffusion