A Differential Perspective on Distributional Reinforcement Learning
Juan Sebastian Rojas, Chi-Guhn Lee

TL;DR
This paper extends distributional reinforcement learning to average-reward settings, introducing algorithms that learn long-term reward distributions and outperform non-distributional methods in certain scenarios.
Contribution
It develops the first algorithms for distributional RL in average-reward MDPs, including proven-convergent tabular methods and scalable algorithms.
Findings
Algorithms learn and optimize long-run reward distributions.
Distributional methods outperform non-distributional counterparts in some cases.
Rich information about reward distributions is captured.
Abstract
To date, distributional reinforcement learning (distributional RL) methods have exclusively focused on the discounted setting, where an agent aims to optimize a discounted sum of rewards over time. In this work, we extend distributional RL to the average-reward setting, where an agent aims to optimize the reward received per time step. In particular, we utilize a quantile-based approach to develop the first set of algorithms that can successfully learn and/or optimize the long-run per-step reward distribution, as well as the differential return distribution of an average-reward MDP. We derive proven-convergent tabular algorithms for both prediction and control, as well as a broader family of algorithms that have appealing scaling properties. Empirically, we find that these algorithms yield competitive and sometimes superior performance when compared to their non-distributional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adaptive Dynamic Programming Control
