Regret Bounds for Risk-Sensitive Reinforcement Learning

O. Bastani; Y. J. Ma; E. Shen; W. Xu

arXiv:2210.05650·cs.LG·October 12, 2022·1 cites

Regret Bounds for Risk-Sensitive Reinforcement Learning

O. Bastani, Y. J. Ma, E. Shen, W. Xu

PDF

Open Access 1 Video

TL;DR

This paper establishes the first regret bounds for risk-sensitive reinforcement learning, including CVaR, using a novel characterization and optimistic MDP approach, advancing safety-critical RL applications.

Contribution

It introduces the first regret bounds for risk-sensitive RL with CVaR objectives, using new theoretical tools and MDP constructions.

Findings

01

First regret bounds for CVaR-based RL.

02

Novel characterization of risk-sensitive objectives.

03

New optimistic MDP construction for analysis.

Abstract

In safety-critical applications of reinforcement learning such as healthcare and robotics, it is often desirable to optimize risk-sensitive objectives that account for tail outcomes rather than expected reward. We prove the first regret bounds for reinforcement learning under a general class of risk-sensitive objectives including the popular CVaR objective. Our theory is based on a novel characterization of the CVaR objective as well as a novel optimistic MDP construction.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Regret Bounds for Risk-Sensitive Reinforcement Learning· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)