DIME:Diffusion-Based Maximum Entropy Reinforcement Learning
Onur Celik, Zechu Li, Denis Blessing, Ge Li, Daniel Palenicek, Jan Peters, Georgia Chalvatzaki, Gerhard Neumann

TL;DR
DIME introduces a diffusion-based maximum entropy reinforcement learning framework that enhances policy expressiveness and exploration, achieving superior performance on complex control tasks with reduced computational demands.
Contribution
The paper develops a novel diffusion-based MaxEnt-RL method with a provably convergent policy iteration scheme, overcoming entropy intractability and improving high-dimensional control performance.
Findings
Outperforms existing diffusion-based RL methods on challenging benchmarks.
Achieves competitive results with state-of-the-art non-diffusion RL methods.
Requires fewer algorithmic choices and less computation, simplifying implementation.
Abstract
Maximum entropy reinforcement learning (MaxEnt-RL) has become the standard approach to RL due to its beneficial exploration properties. Traditionally, policies are parameterized using Gaussian distributions, which significantly limits their representational capacity. Diffusion-based policies offer a more expressive alternative, yet integrating them into MaxEnt-RL poses challenges-primarily due to the intractability of computing their marginal entropy. To overcome this, we propose Diffusion-Based Maximum Entropy RL (DIME). \emph{DIME} leverages recent advances in approximate inference with diffusion models to derive a lower bound on the maximum entropy objective. Additionally, we propose a policy iteration scheme that provably converges to the optimal diffusion policy. Our method enables the use of expressive diffusion-based policies while retaining the principled exploration benefits of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural Networks and Applications · Smart Grid Energy Management
MethodsDistance to Modelled Embedding · Diffusion
