Distributional Soft Actor-Critic with Diffusion Policy
Tong Liu, Yinuo Wang, Xujie Song, Wenjun Zou, Liangfa Chen, Likun Wang, Bin Shuai, Jingliang Duan, Shengbo Eben Li

TL;DR
This paper introduces DSAC-D, a distributional reinforcement learning algorithm utilizing diffusion models to accurately learn multimodal policies and distributions, significantly reducing bias and improving performance in complex control tasks.
Contribution
The paper presents a novel diffusion-based distributional RL framework that effectively models multimodal policies and reduces value estimation bias, achieving state-of-the-art results.
Findings
Achieves over 10% improvement in average return on control tasks.
Successfully characterizes multimodal distributions of driving styles.
Demonstrates state-of-the-art performance in 9 control tasks.
Abstract
Reinforcement learning has been proven to be highly effective in handling complex control tasks. Traditional methods typically use unimodal distributions, such as Gaussian distributions, to model the output of value distributions. However, unimodal distribution often and easily causes bias in value function estimation, leading to poor algorithm performance. This paper proposes a distributional reinforcement learning algorithm called DSAC-D (Distributed Soft Actor Critic with Diffusion Policy) to address the challenges of estimating bias in value functions and obtaining multimodal policy representations. A multimodal distributional policy iteration framework that can converge to the optimal policy was established by introducing policy entropy and value distribution function. A diffusion value network that can accurately characterize the distribution of multi peaks was constructed by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
