Risk-Averse Reinforcement Learning via Dynamic Time-Consistent Risk   Measures

Xian Yu; Siqian Shen

arXiv:2301.05981·cs.LG·January 18, 2023

Risk-Averse Reinforcement Learning via Dynamic Time-Consistent Risk Measures

Xian Yu, Siqian Shen

PDF

TL;DR

This paper introduces a risk-averse reinforcement learning framework using dynamic time-consistent risk measures, reformulating the problem to improve robustness and reduce variance in decision-making under uncertainty.

Contribution

It adapts expected conditional risk measures to infinite-horizon MDPs, proves their time consistency, and develops a risk-averse deep Q-learning algorithm.

Findings

01

Risk-averse RL reduces variance in outcomes.

02

The proposed method guarantees convergence of RL algorithms.

03

Numerical results show enhanced robustness in simple MDPs.

Abstract

Traditional reinforcement learning (RL) aims to maximize the expected total reward, while the risk of uncertain outcomes needs to be controlled to ensure reliable performance in a risk-averse setting. In this paper, we consider the problem of maximizing dynamic risk of a sequence of rewards in infinite-horizon Markov Decision Processes (MDPs). We adapt the Expected Conditional Risk Measures (ECRMs) to the infinite-horizon risk-averse MDP and prove its time consistency. Using a convex combination of expectation and conditional value-at-risk (CVaR) as a special one-step conditional risk measure, we reformulate the risk-averse MDP as a risk-neutral counterpart with augmented action space and manipulation on the immediate rewards. We further prove that the related Bellman operator is a contraction mapping, which guarantees the convergence of any value-based RL algorithms. Accordingly, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsQ-Learning