An Analysis of Categorical Distributional Reinforcement Learning

Mark Rowland; Marc G. Bellemare; Will Dabney; R\'emi Munos; Yee Whye; Teh

arXiv:1802.08163·stat.ML·February 23, 2018·AISTATS·40 cites

An Analysis of Categorical Distributional Reinforcement Learning

Mark Rowland, Marc G. Bellemare, Will Dabney, R\'emi Munos, Yee Whye, Teh

PDF

Open Access

TL;DR

This paper provides a theoretical framework for categorical distributional reinforcement learning, clarifying its properties, connections to the Cramér distance, and proving convergence of sample-based algorithms.

Contribution

It introduces a framework for analyzing CDRL, highlights the importance of the projected distributional Bellman operator, and proves convergence of categorical distributional RL algorithms.

Findings

01

Established the importance of the projected distributional Bellman operator

02

Drawn connections between CDRL and the Cramér distance

03

Proved convergence of sample-based categorical distributional RL algorithms

Abstract

Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance. This was demonstrated by the recently proposed C51 algorithm, based on categorical distributional reinforcement learning (CDRL) [Bellemare et al., 2017]. However, the theoretical properties of CDRL algorithms are not yet well understood. In this paper, we introduce a framework to analyse CDRL algorithms, establish the importance of the projected distributional Bellman operator in distributional RL, draw fundamental connections between CDRL and the Cram\'er distance, and give a proof of convergence for sample-based categorical distributional reinforcement learning algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Evolutionary Algorithms and Applications