Distributional reinforcement learning with linear function approximation

Marc G. Bellemare; Nicolas Le Roux; Pablo Samuel Castro; Subhodeep; Moitra

arXiv:1902.03149·cs.LG·February 11, 2019·6 cites

Distributional reinforcement learning with linear function approximation

Marc G. Bellemare, Nicolas Le Roux, Pablo Samuel Castro, Subhodeep, Moitra

PDF

Open Access

TL;DR

This paper introduces a new distributional reinforcement learning algorithm based on the Cramér distance that works with linear function approximation and provides the first convergence proof for such methods, highlighting potential performance drawbacks.

Contribution

It adapts the Cramér distance for arbitrary vectors and develops a distributional algorithm with formal guarantees in policy evaluation, extending theoretical understanding.

Findings

01

First convergence proof of distributional RL with function approximation.

02

Cramér-based methods may underperform compared to direct value function approximation.

03

The new method generalizes distributional RL to arbitrary real vectors, losing probabilistic interpretation.

Abstract

Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited. One exception is Rowland et al. (2018)'s analysis of the C51 algorithm in terms of the Cram\'er distance, but their results only apply to the tabular setting and ignore C51's use of a softmax to produce normalized distributions. In this paper we adapt the Cram\'er distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cram\'er-based and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the model's prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the first proof of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsSoftmax