Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents
Wenhao Xu, Xuefeng Gao, Xuedong He

TL;DR
This paper introduces a new risk-sensitive reinforcement learning framework using recursive optimized certainty equivalents (OCEs) for Markov decision processes, providing algorithms with optimal regret bounds.
Contribution
It develops a novel episodic risk-sensitive RL formulation with recursive OCEs and proposes an efficient algorithm with proven optimal regret bounds.
Findings
The proposed algorithm achieves regret bounds with optimal dependence on episodes and actions.
Upper and lower bounds on regret are established, confirming optimality.
The framework generalizes several risk measures like CVaR and entropic risk.
Abstract
The optimized certainty equivalent (OCE) is a family of risk measures that cover important examples such as entropic risk, conditional value-at-risk and mean-variance models. In this paper, we propose a new episodic risk-sensitive reinforcement learning formulation based on tabular Markov decision processes with recursive OCEs. We design an efficient learning algorithm for this problem based on value iteration and upper confidence bound. We derive an upper bound on the regret of the proposed algorithm, and also establish a minimax lower bound. Our bounds show that the regret rate achieved by our proposed algorithm has optimal dependence on the number of episodes and the number of actions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Bayesian Modeling and Causal Inference · Reinforcement Learning in Robotics
