Bellman Unbiasedness: Toward Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation
Taehyun Cho, Seungyub Han, Seokhun Ju, Dohyeong Kim, Kyungjae Lee, Jungwoo Lee

TL;DR
This paper provides a theoretical analysis of distributional reinforcement learning, introducing the concept of Bellman unbiasedness, and proposes an efficient algorithm with proven regret bounds for finite MDPs.
Contribution
It introduces Bellman unbiasedness for distributional RL and develops SF-LSVI, an algorithm with tight regret bounds for general value function approximation.
Findings
Bellman unbiasedness is essential for learnable distributional updates.
Only moment functionals can exactly capture distributional information.
SF-LSVI achieves a regret bound of O(d_E H^{3/2} K).
Abstract
Distributional reinforcement learning improves performance by capturing environmental stochasticity, but a comprehensive theoretical understanding of its effectiveness remains elusive. In addition, the intractable element of the infinite dimensionality of distributions has been overlooked. In this paper, we present a regret analysis of distributional reinforcement learning with general value function approximation in a finite episodic Markov decision process setting. We first introduce a key notion of which is essential for exactly learnable and provably efficient distributional updates in an online manner. Among all types of statistical functionals for representing infinite-dimensional return distributions, our theoretical results demonstrate that only moment functionals can exactly capture the statistical information. Secondly, we propose a provably…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Smart Parking Systems Research
