A Differential Perspective on Distributional Reinforcement Learning

Juan Sebastian Rojas; Chi-Guhn Lee

arXiv:2506.03333·cs.LG·January 14, 2026

A Differential Perspective on Distributional Reinforcement Learning

Juan Sebastian Rojas, Chi-Guhn Lee

PDF

Open Access 1 Video

TL;DR

This paper extends distributional reinforcement learning to average-reward settings, introducing algorithms that learn long-term reward distributions and outperform non-distributional methods in certain scenarios.

Contribution

It develops the first algorithms for distributional RL in average-reward MDPs, including proven-convergent tabular methods and scalable algorithms.

Findings

01

Algorithms learn and optimize long-run reward distributions.

02

Distributional methods outperform non-distributional counterparts in some cases.

03

Rich information about reward distributions is captured.

Abstract

To date, distributional reinforcement learning (distributional RL) methods have exclusively focused on the discounted setting, where an agent aims to optimize a discounted sum of rewards over time. In this work, we extend distributional RL to the average-reward setting, where an agent aims to optimize the reward received per time step. In particular, we utilize a quantile-based approach to develop the first set of algorithms that can successfully learn and/or optimize the long-run per-step reward distribution, as well as the differential return distribution of an average-reward MDP. We derive proven-convergent tabular algorithms for both prediction and control, as well as a broader family of algorithms that have appealing scaling properties. Empirically, we find that these algorithms yield competitive and sometimes superior performance when compared to their non-distributional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Differential Perspective on Distributional Reinforcement Learning· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adaptive Dynamic Programming Control