Regret Bounds for Risk-Sensitive Reinforcement Learning
O. Bastani, Y. J. Ma, E. Shen, W. Xu

TL;DR
This paper establishes the first regret bounds for risk-sensitive reinforcement learning, including CVaR, using a novel characterization and optimistic MDP approach, advancing safety-critical RL applications.
Contribution
It introduces the first regret bounds for risk-sensitive RL with CVaR objectives, using new theoretical tools and MDP constructions.
Findings
First regret bounds for CVaR-based RL.
Novel characterization of risk-sensitive objectives.
New optimistic MDP construction for analysis.
Abstract
In safety-critical applications of reinforcement learning such as healthcare and robotics, it is often desirable to optimize risk-sensitive objectives that account for tail outcomes rather than expected reward. We prove the first regret bounds for reinforcement learning under a general class of risk-sensitive objectives including the popular CVaR objective. Our theory is based on a novel characterization of the CVaR objective as well as a novel optimistic MDP construction.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
