Distributional Soft Actor-Critic with Diffusion Policy

Tong Liu; Yinuo Wang; Xujie Song; Wenjun Zou; Liangfa Chen; Likun Wang; Bin Shuai; Jingliang Duan; Shengbo Eben Li

arXiv:2507.01381·cs.LG·July 14, 2025

Distributional Soft Actor-Critic with Diffusion Policy

Tong Liu, Yinuo Wang, Xujie Song, Wenjun Zou, Liangfa Chen, Likun Wang, Bin Shuai, Jingliang Duan, Shengbo Eben Li

PDF

TL;DR

This paper introduces DSAC-D, a distributional reinforcement learning algorithm utilizing diffusion models to accurately learn multimodal policies and distributions, significantly reducing bias and improving performance in complex control tasks.

Contribution

The paper presents a novel diffusion-based distributional RL framework that effectively models multimodal policies and reduces value estimation bias, achieving state-of-the-art results.

Findings

01

Achieves over 10% improvement in average return on control tasks.

02

Successfully characterizes multimodal distributions of driving styles.

03

Demonstrates state-of-the-art performance in 9 control tasks.

Abstract

Reinforcement learning has been proven to be highly effective in handling complex control tasks. Traditional methods typically use unimodal distributions, such as Gaussian distributions, to model the output of value distributions. However, unimodal distribution often and easily causes bias in value function estimation, leading to poor algorithm performance. This paper proposes a distributional reinforcement learning algorithm called DSAC-D (Distributed Soft Actor Critic with Diffusion Policy) to address the challenges of estimating bias in value functions and obtaining multimodal policy representations. A multimodal distributional policy iteration framework that can converge to the optimal policy was established by introducing policy entropy and value distribution function. A diffusion value network that can accurately characterize the distribution of multi peaks was constructed by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.