Bellman Unbiasedness: Toward Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation

Taehyun Cho; Seungyub Han; Seokhun Ju; Dohyeong Kim; Kyungjae Lee; Jungwoo Lee

arXiv:2407.21260·cs.LG·May 14, 2025

Bellman Unbiasedness: Toward Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation

Taehyun Cho, Seungyub Han, Seokhun Ju, Dohyeong Kim, Kyungjae Lee, Jungwoo Lee

PDF

Open Access 1 Video

TL;DR

This paper provides a theoretical analysis of distributional reinforcement learning, introducing the concept of Bellman unbiasedness, and proposes an efficient algorithm with proven regret bounds for finite MDPs.

Contribution

It introduces Bellman unbiasedness for distributional RL and develops SF-LSVI, an algorithm with tight regret bounds for general value function approximation.

Findings

01

Bellman unbiasedness is essential for learnable distributional updates.

02

Only moment functionals can exactly capture distributional information.

03

SF-LSVI achieves a regret bound of O(d_E H^{3/2} K).

Abstract

Distributional reinforcement learning improves performance by capturing environmental stochasticity, but a comprehensive theoretical understanding of its effectiveness remains elusive. In addition, the intractable element of the infinite dimensionality of distributions has been overlooked. In this paper, we present a regret analysis of distributional reinforcement learning with general value function approximation in a finite episodic Markov decision process setting. We first introduce a key notion of $Bellman unbiasedness$ which is essential for exactly learnable and provably efficient distributional updates in an online manner. Among all types of statistical functionals for representing infinite-dimensional return distributions, our theoretical results demonstrate that only moment functionals can exactly capture the statistical information. Secondly, we propose a provably…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Bellman Unbiasedness: Toward Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Parking Systems Research