Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models
Sangwoong Yoon, Himchan Hwang, Dohyun Kwon, Yung-Kyun Noh, Frank C., Park

TL;DR
This paper introduces a maximum entropy IRL framework for diffusion models, jointly training them with EBMs to improve sample quality with fewer steps and stabilize EBM training.
Contribution
It proposes DxMI, a joint training method for diffusion models and EBMs using IRL principles, and introduces DxDP, a new RL algorithm for efficient diffusion model updates.
Findings
High-quality samples with as few as 4-10 steps.
EBM training stabilized without MCMC.
Enhanced anomaly detection performance.
Abstract
We present a maximum entropy inverse reinforcement learning (IRL) approach for improving the sample quality of diffusion generative models, especially when the number of generation time steps is small. Similar to how IRL trains a policy based on the reward function learned from expert demonstrations, we train (or fine-tune) a diffusion model using the log probability density estimated from training data. Since we employ an energy-based model (EBM) to represent the log density, our approach boils down to the joint training of a diffusion model and an EBM. Our IRL formulation, named Diffusion by Maximum Entropy IRL (DxMI), is a minimax problem that reaches equilibrium when both models converge to the data distribution. The entropy maximization plays a key role in DxMI, facilitating the exploration of the diffusion model and ensuring the convergence of the EBM. We also propose Diffusion by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsModel Reduction and Neural Networks · Advancements in Semiconductor Devices and Circuit Design · Iterative Learning Control Systems
MethodsDiffusion · energy-based model
