Maximum Entropy Reinforcement Learning with Diffusion Policy

Xiaoyi Dong; Jian Cheng; Xi Sheryl Zhang

arXiv:2502.11612·cs.LG·June 9, 2025

Maximum Entropy Reinforcement Learning with Diffusion Policy

Xiaoyi Dong, Jian Cheng, Xi Sheryl Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MaxEntDP, a diffusion model-based policy for MaxEnt RL, which improves exploration and performance over Gaussian policies, especially in complex environments, demonstrated on Mujoco benchmarks.

Contribution

The paper proposes using diffusion models as policies in MaxEnt RL, enabling better exploration and complex distribution modeling compared to traditional Gaussian policies.

Findings

01

MaxEntDP outperforms Gaussian policies on Mujoco benchmarks.

02

Diffusion policies achieve comparable results to state-of-the-art diffusion RL methods.

03

Enhanced exploration and policy robustness in complex environments.

Abstract

The Soft Actor-Critic (SAC) algorithm with a Gaussian policy has become a mainstream implementation for realizing the Maximum Entropy Reinforcement Learning (MaxEnt RL) objective, which incorporates entropy maximization to encourage exploration and enhance policy robustness. While the Gaussian policy performs well on simpler tasks, its exploration capacity and potential performance in complex multi-goal RL environments are limited by its inherent unimodality. In this paper, we employ the diffusion model, a powerful generative model capable of capturing complex multimodal distributions, as the policy representation to fulfill the MaxEnt RL objective, developing a method named MaxEnt RL with Diffusion Policy (MaxEntDP). Our method enables efficient exploration and brings the policy closer to the optimal MaxEnt policy. Experimental results on Mujoco benchmarks show that MaxEntDP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

diffusionyes/maxentdp
jaxOfficial

Videos

Maximum Entropy Reinforcement Learning with Diffusion Policy· slideslive

Taxonomy

TopicsAdaptive Dynamic Programming Control

MethodsDiffusion