Optimizing Return Distributions with Distributional Dynamic Programming
Bernardo \'Avila Pires, Mark Rowland, Diana Borsa, Zhaohan Daniel Guo, Khimya Khetarpal, Andr\'e Barreto, David Abel, R\'emi Munos, Will Dabney

TL;DR
This paper develops distributional dynamic programming methods that optimize complex return distribution functionals, extending traditional reinforcement learning, and demonstrates their effectiveness through theoretical analysis and practical algorithms.
Contribution
It introduces a novel combination of distributional DP with stock augmentation, enabling optimization of advanced risk-sensitive objectives in RL.
Findings
Successfully formulated risk-sensitive problems as stock-augmented return distribution optimization.
Provided theoretical bounds and analysis for distributional value and policy iteration.
Empirically validated the approach with a DQN-based agent on multiple applications.
Abstract
We introduce distributional dynamic programming (DP) methods for optimizing statistical functionals of the return distribution, with standard reinforcement learning as a special case. Previous distributional DP methods could optimize the same class of expected utilities as classic DP. To go beyond, we combine distributional DP with stock augmentation, a technique previously introduced for classic DP in the context of risk-sensitive RL, where the MDP state is augmented with a statistic of the rewards obtained since the first time step. We find that a number of recently studied problems can be formulated as stock-augmented return distribution optimization, and we show that we can use distributional DP to solve them. We analyze distributional value and policy iteration, with bounds and a study of what objectives these distributional DP methods can or cannot optimize. We describe a number…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications
MethodsQ-Learning · Dense Connections · Convolution · Deep Q-Network
