Diffusion Actor-Critic with Entropy Regulator

Yinuo Wang; Likun Wang; Yuxuan Jiang; Wenjun Zou; Tong Liu; Xujie; Song; Wenxuan Wang; Liming Xiao; Jiang Wu; Jingliang Duan; Shengbo Eben Li

arXiv:2405.15177·cs.LG·December 24, 2024

Diffusion Actor-Critic with Entropy Regulator

Yinuo Wang, Likun Wang, Yuxuan Jiang, Wenjun Zou, Tong Liu, Xujie, Song, Wenxuan Wang, Liming Xiao, Jiang Wu, Jingliang Duan, Shengbo Eben Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces DACER, a novel reinforcement learning algorithm that uses diffusion models as policies, enabling more complex, multimodal action distributions and adaptive exploration regulation, leading to state-of-the-art results.

Contribution

The paper proposes a diffusion-based policy framework with entropy regulation for reinforcement learning, enhancing policy complexity and exploration control.

Findings

01

Achieves state-of-the-art performance on MuJoCo benchmarks.

02

Demonstrates stronger policy representational capacity.

03

Effectively balances exploration and exploitation.

Abstract

Reinforcement learning (RL) has proven highly effective in addressing complex decision-making and control tasks. However, in most traditional RL algorithms, the policy is typically parameterized as a diagonal Gaussian distribution with learned mean and variance, which constrains their capability to acquire complex policies. In response to this problem, we propose an online RL algorithm termed diffusion actor-critic with entropy regulator (DACER). This algorithm conceptualizes the reverse process of the diffusion model as a novel policy function and leverages the capability of the diffusion model to fit multimodal distributions, thereby enhancing the representational capacity of the policy. Since the distribution of the diffusion policy lacks an analytical expression, its entropy cannot be determined analytically. To mitigate this, we propose a method to estimate the entropy of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

happy-yan/DACER-Diffusion-with-Online-RL
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Thermodynamics and Statistical Mechanics

MethodsDiffusion