Distributional Reinforcement Learning with Diffusion Bridge Critics

Shutong Ding; Yimiao Zhou; Ke Hu; Mokai Pan; Shan Zhong; Yanwei Fu; Jingya Wang; Ye Shi

arXiv:2602.05783·cs.LG·February 6, 2026

Distributional Reinforcement Learning with Diffusion Bridge Critics

Shutong Ding, Yimiao Zhou, Ke Hu, Mokai Pan, Shan Zhong, Yanwei Fu, Jingya Wang, Ye Shi

PDF

Open Access

TL;DR

This paper introduces Diffusion Bridge Critics (DBC), a novel distributional reinforcement learning method that models the inverse CDF of Q values using diffusion processes, leading to more accurate value estimation in continuous control tasks.

Contribution

The paper presents the first use of diffusion bridge models as critics in RL, enhancing distributional value estimation and integrating seamlessly into existing frameworks.

Findings

01

DBC outperforms previous distributional critic models on MuJoCo benchmarks.

02

The diffusion bridge critic effectively captures the value distribution without collapsing.

03

An analytic integral formula reduces discretization errors in value estimation.

Abstract

Recent advances in diffusion-based reinforcement learning (RL) methods have demonstrated promising results in a wide range of continuous control tasks. However, existing works in this field focus on the application of diffusion policies while leaving the diffusion critics unexplored. In fact, since policy optimization fundamentally relies on the critic, accurate value estimation is far more important than policy expressiveness. Furthermore, given the stochasticity of most reinforcement learning tasks, it has been confirmed that the critic is more appropriately depicted with a distributional model. Motivated by these points, we propose a novel distributional RL method with Diffusion Bridge Critics (DBC). DBC directly models the inverse cumulative distribution function (CDF) of the Q value. This allows us to accurately capture the value distribution and prevents it from collapsing into a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Extremum Seeking Control Systems