Distributional Reinforcement Learning with Diffusion Bridge Critics
Shutong Ding, Yimiao Zhou, Ke Hu, Mokai Pan, Shan Zhong, Yanwei Fu, Jingya Wang, Ye Shi

TL;DR
This paper introduces Diffusion Bridge Critics (DBC), a novel distributional reinforcement learning method that models the inverse CDF of Q values using diffusion processes, leading to more accurate value estimation in continuous control tasks.
Contribution
The paper presents the first use of diffusion bridge models as critics in RL, enhancing distributional value estimation and integrating seamlessly into existing frameworks.
Findings
DBC outperforms previous distributional critic models on MuJoCo benchmarks.
The diffusion bridge critic effectively captures the value distribution without collapsing.
An analytic integral formula reduces discretization errors in value estimation.
Abstract
Recent advances in diffusion-based reinforcement learning (RL) methods have demonstrated promising results in a wide range of continuous control tasks. However, existing works in this field focus on the application of diffusion policies while leaving the diffusion critics unexplored. In fact, since policy optimization fundamentally relies on the critic, accurate value estimation is far more important than policy expressiveness. Furthermore, given the stochasticity of most reinforcement learning tasks, it has been confirmed that the critic is more appropriately depicted with a distributional model. Motivated by these points, we propose a novel distributional RL method with Diffusion Bridge Critics (DBC). DBC directly models the inverse cumulative distribution function (CDF) of the Q value. This allows us to accurately capture the value distribution and prevents it from collapsing into a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Extremum Seeking Control Systems
