Maximize Your Diffusion: A Study into Reward Maximization and Alignment   for Diffusion-based Control

Dom Huh; Prasant Mohapatra

arXiv:2502.12198·cs.LG·February 19, 2025

Maximize Your Diffusion: A Study into Reward Maximization and Alignment for Diffusion-based Control

Dom Huh, Prasant Mohapatra

PDF

Open Access

TL;DR

This paper explores unified fine-tuning methods for diffusion-based control, enhancing reward maximization and alignment in decision-making tasks through empirical evaluations.

Contribution

It introduces a unified paradigm combining multiple fine-tuning approaches for diffusion models to improve reward maximization in control applications.

Findings

01

Empirical improvements in offline RL control tasks

02

Effective reward alignment through combined fine-tuning methods

03

Enhanced decision-making performance over existing approaches

Abstract

Diffusion-based planning, learning, and control methods present a promising branch of powerful and expressive decision-making solutions. Given the growing interest, such methods have undergone numerous refinements over the past years. However, despite these advancements, existing methods are limited in their investigations regarding general methods for reward maximization within the decision-making process. In this work, we study extensions of fine-tuning approaches for control applications. Specifically, we explore extensions and various design choices for four fine-tuning approaches: reward alignment through reinforcement learning, direct preference optimization, supervised fine-tuning, and cascading diffusion. We optimize their usage to merge these independent efforts into one unified paradigm. We show the utility of such propositions in offline RL settings and demonstrate empirical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications