Robustness and risk management via distributional dynamic programming
Mastane Achab, Gergely Neu

TL;DR
This paper introduces a new distributional dynamic programming approach that enhances robustness and risk management in reinforcement learning by distinguishing safe and risky policies through an augmented state space.
Contribution
It proposes a novel class of distributional operators and a practical DP algorithm with a robust MDP interpretation, enabling better risk-sensitive decision making.
Findings
New distributional operators with robust MDP interpretation
A DP algorithm for policy evaluation in risk-sensitive settings
Ability to distinguish safe from risky optimal actions
Abstract
In dynamic programming (DP) and reinforcement learning (RL), an agent learns to act optimally in terms of expected long-term return by sequentially interacting with its environment modeled by a Markov decision process (MDP). More generally in distributional reinforcement learning (DRL), the focus is on the whole distribution of the return, not just its expectation. Although DRL-based methods produced state-of-the-art performance in RL with function approximation, they involve additional quantities (compared to the non-distributional setting) that are still not well understood. As a first contribution, we introduce a new class of distributional operators, together with a practical DP algorithm for policy evaluation, that come with a robust MDP interpretation. Indeed, our approach reformulates through an augmented state space where each state is split into a worst-case substate and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
