Regret Bounds for Markov Decision Processes with Recursive Optimized   Certainty Equivalents

Wenhao Xu; Xuefeng Gao; Xuedong He

arXiv:2301.12601·cs.LG·June 9, 2023·5 cites

Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents

Wenhao Xu, Xuefeng Gao, Xuedong He

PDF

Open Access 1 Video

TL;DR

This paper introduces a new risk-sensitive reinforcement learning framework using recursive optimized certainty equivalents (OCEs) for Markov decision processes, providing algorithms with optimal regret bounds.

Contribution

It develops a novel episodic risk-sensitive RL formulation with recursive OCEs and proposes an efficient algorithm with proven optimal regret bounds.

Findings

01

The proposed algorithm achieves regret bounds with optimal dependence on episodes and actions.

02

Upper and lower bounds on regret are established, confirming optimality.

03

The framework generalizes several risk measures like CVaR and entropic risk.

Abstract

The optimized certainty equivalent (OCE) is a family of risk measures that cover important examples such as entropic risk, conditional value-at-risk and mean-variance models. In this paper, we propose a new episodic risk-sensitive reinforcement learning formulation based on tabular Markov decision processes with recursive OCEs. We design an efficient learning algorithm for this problem based on value iteration and upper confidence bound. We derive an upper bound on the regret of the proposed algorithm, and also establish a minimax lower bound. Our bounds show that the regret rate achieved by our proposed algorithm has optimal dependence on the number of episodes and the number of actions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents· slideslive

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Bayesian Modeling and Causal Inference · Reinforcement Learning in Robotics