Robustness and risk management via distributional dynamic programming

Mastane Achab; Gergely Neu

arXiv:2112.15430·cs.LG·January 3, 2022

Robustness and risk management via distributional dynamic programming

Mastane Achab, Gergely Neu

PDF

Open Access

TL;DR

This paper introduces a new distributional dynamic programming approach that enhances robustness and risk management in reinforcement learning by distinguishing safe and risky policies through an augmented state space.

Contribution

It proposes a novel class of distributional operators and a practical DP algorithm with a robust MDP interpretation, enabling better risk-sensitive decision making.

Findings

01

New distributional operators with robust MDP interpretation

02

A DP algorithm for policy evaluation in risk-sensitive settings

03

Ability to distinguish safe from risky optimal actions

Abstract

In dynamic programming (DP) and reinforcement learning (RL), an agent learns to act optimally in terms of expected long-term return by sequentially interacting with its environment modeled by a Markov decision process (MDP). More generally in distributional reinforcement learning (DRL), the focus is on the whole distribution of the return, not just its expectation. Although DRL-based methods produced state-of-the-art performance in RL with function approximation, they involve additional quantities (compared to the non-distributional setting) that are still not well understood. As a first contribution, we introduce a new class of distributional operators, together with a practical DP algorithm for policy evaluation, that come with a robust MDP interpretation. Indeed, our approach reformulates through an augmented state space where each state is split into a worst-case substate and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics