An Analysis of Categorical Distributional Reinforcement Learning
Mark Rowland, Marc G. Bellemare, Will Dabney, R\'emi Munos, Yee Whye, Teh

TL;DR
This paper provides a theoretical framework for categorical distributional reinforcement learning, clarifying its properties, connections to the Cramér distance, and proving convergence of sample-based algorithms.
Contribution
It introduces a framework for analyzing CDRL, highlights the importance of the projected distributional Bellman operator, and proves convergence of categorical distributional RL algorithms.
Findings
Established the importance of the projected distributional Bellman operator
Drawn connections between CDRL and the Cramér distance
Proved convergence of sample-based categorical distributional RL algorithms
Abstract
Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance. This was demonstrated by the recently proposed C51 algorithm, based on categorical distributional reinforcement learning (CDRL) [Bellemare et al., 2017]. However, the theoretical properties of CDRL algorithms are not yet well understood. In this paper, we introduce a framework to analyse CDRL algorithms, establish the importance of the projected distributional Bellman operator in distributional RL, draw fundamental connections between CDRL and the Cram\'er distance, and give a proof of convergence for sample-based categorical distributional reinforcement learning algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Evolutionary Algorithms and Applications
