Distributional reinforcement learning with linear function approximation
Marc G. Bellemare, Nicolas Le Roux, Pablo Samuel Castro, Subhodeep, Moitra

TL;DR
This paper introduces a new distributional reinforcement learning algorithm based on the Cramér distance that works with linear function approximation and provides the first convergence proof for such methods, highlighting potential performance drawbacks.
Contribution
It adapts the Cramér distance for arbitrary vectors and develops a distributional algorithm with formal guarantees in policy evaluation, extending theoretical understanding.
Findings
First convergence proof of distributional RL with function approximation.
Cramér-based methods may underperform compared to direct value function approximation.
The new method generalizes distributional RL to arbitrary real vectors, losing probabilistic interpretation.
Abstract
Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited. One exception is Rowland et al. (2018)'s analysis of the C51 algorithm in terms of the Cram\'er distance, but their results only apply to the tabular setting and ignore C51's use of a softmax to produce normalized distributions. In this paper we adapt the Cram\'er distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cram\'er-based and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the model's prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the first proof of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsSoftmax
