Risk-Averse Reinforcement Learning via Dynamic Time-Consistent Risk Measures
Xian Yu, Siqian Shen

TL;DR
This paper introduces a risk-averse reinforcement learning framework using dynamic time-consistent risk measures, reformulating the problem to improve robustness and reduce variance in decision-making under uncertainty.
Contribution
It adapts expected conditional risk measures to infinite-horizon MDPs, proves their time consistency, and develops a risk-averse deep Q-learning algorithm.
Findings
Risk-averse RL reduces variance in outcomes.
The proposed method guarantees convergence of RL algorithms.
Numerical results show enhanced robustness in simple MDPs.
Abstract
Traditional reinforcement learning (RL) aims to maximize the expected total reward, while the risk of uncertain outcomes needs to be controlled to ensure reliable performance in a risk-averse setting. In this paper, we consider the problem of maximizing dynamic risk of a sequence of rewards in infinite-horizon Markov Decision Processes (MDPs). We adapt the Expected Conditional Risk Measures (ECRMs) to the infinite-horizon risk-averse MDP and prove its time consistency. Using a convex combination of expectation and conditional value-at-risk (CVaR) as a special one-step conditional risk measure, we reformulate the risk-averse MDP as a risk-neutral counterpart with augmented action space and manipulation on the immediate rewards. We further prove that the related Bellman operator is a contraction mapping, which guarantees the convergence of any value-based RL algorithms. Accordingly, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsQ-Learning
