Optimizing Return Distributions with Distributional Dynamic Programming

Bernardo \'Avila Pires; Mark Rowland; Diana Borsa; Zhaohan Daniel Guo; Khimya Khetarpal; Andr\'e Barreto; David Abel; R\'emi Munos; Will Dabney

arXiv:2501.13028·cs.LG·August 5, 2025

Optimizing Return Distributions with Distributional Dynamic Programming

Bernardo \'Avila Pires, Mark Rowland, Diana Borsa, Zhaohan Daniel Guo, Khimya Khetarpal, Andr\'e Barreto, David Abel, R\'emi Munos, Will Dabney

PDF

Open Access

TL;DR

This paper develops distributional dynamic programming methods that optimize complex return distribution functionals, extending traditional reinforcement learning, and demonstrates their effectiveness through theoretical analysis and practical algorithms.

Contribution

It introduces a novel combination of distributional DP with stock augmentation, enabling optimization of advanced risk-sensitive objectives in RL.

Findings

01

Successfully formulated risk-sensitive problems as stock-augmented return distribution optimization.

02

Provided theoretical bounds and analysis for distributional value and policy iteration.

03

Empirically validated the approach with a DQN-based agent on multiple applications.

Abstract

We introduce distributional dynamic programming (DP) methods for optimizing statistical functionals of the return distribution, with standard reinforcement learning as a special case. Previous distributional DP methods could optimize the same class of expected utilities as classic DP. To go beyond, we combine distributional DP with stock augmentation, a technique previously introduced for classic DP in the context of risk-sensitive RL, where the MDP state is augmented with a statistic of the rewards obtained since the first time step. We find that a number of recently studied problems can be formulated as stock-augmented return distribution optimization, and we show that we can use distributional DP to solve them. We analyze distributional value and policy iteration, with bounds and a study of what objectives these distributional DP methods can or cannot optimize. We describe a number…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuction Theory and Applications

MethodsQ-Learning · Dense Connections · Convolution · Deep Q-Network