Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR

Kaiwen Wang; Nathan Kallus; Wen Sun

arXiv:2302.03201·cs.LG·May 26, 2023·1 cites

Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR

Kaiwen Wang, Nathan Kallus, Wen Sun

PDF

Open Access 1 Video

TL;DR

This paper develops near-minimax-optimal algorithms for risk-sensitive reinforcement learning with CVaR, providing tight regret bounds in multi-arm bandits and tabular MDPs, and introduces novel bonus-driven methods.

Contribution

It introduces new algorithms with optimal regret bounds for CVaR-based RL, including a Bernstein bonus for bandits and a bonus-driven value iteration for MDPs, improving existing bounds.

Findings

01

Achieves minimax CVaR regret rate of in bandits.

02

Establishes a lower bound of in tabular MDPs.

03

Proposes algorithms that attain near-optimal regret bounds under CVaR risk measure.

Abstract

In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $τ$ . Starting with multi-arm bandits (MABs), we show the minimax CVaR regret rate is $Ω (τ^{- 1} A K)$ , where $A$ is the number of actions and $K$ is the number of episodes, and that it is achieved by an Upper Confidence Bound algorithm with a novel Bernstein bonus. For online RL in tabular Markov Decision Processes (MDPs), we show a minimax regret lower bound of $Ω (τ^{- 1} S A K)$ (with normalized cumulative rewards), where $S$ is the number of states, and we propose a novel bonus-driven Value Iteration procedure. We show that our algorithm achieves the optimal regret of $O (τ^{- 1} S A K)$ under a continuity assumption and in general attains a near-optimal regret of $\widetilde…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization