Foundations of Multivariate Distributional Reinforcement Learning

Harley Wiltzer; Jesse Farebrother; Arthur Gretton; Mark Rowland

arXiv:2409.00328·cs.LG·September 5, 2024

Foundations of Multivariate Distributional Reinforcement Learning

Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Mark Rowland

PDF

Open Access 1 Video

TL;DR

This paper develops the first provably convergent algorithms for multivariate distributional reinforcement learning, addressing challenges in high-dimensional reward spaces and providing insights into distribution representation tradeoffs.

Contribution

It introduces oracle-free, computationally-tractable algorithms for multivariate distributional RL with convergence guarantees and novel analysis techniques for high-dimensional reward distributions.

Findings

01

Convergence rates match scalar reward settings.

02

Standard categorical TD analysis fails for reward dimensions > 1, resolved by a new projection.

03

Tradeoffs in distribution representations affect practical performance.

Abstract

In reinforcement learning (RL), the consideration of multivariate reward signals has led to fundamental advancements in multi-objective decision-making, transfer learning, and representation learning. This work introduces the first oracle-free and computationally-tractable algorithms for provably convergent multivariate distributional dynamic programming and temporal difference learning. Our convergence rates match the familiar rates in the scalar reward setting, and additionally provide new insights into the fidelity of approximate return distribution representations as a function of the reward dimension. Surprisingly, when the reward dimension is larger than $1$ , we show that standard analysis of categorical TD learning fails, which we resolve with a novel projection onto the space of mass- $1$ signed measures. Finally, with the aid of our technical results and simulations, we identify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Foundations of Multivariate Distributional Reinforcement Learning· slideslive

Taxonomy

TopicsFood Supply Chain Traceability · Statistical and Computational Modeling