On the Sample Complexity of Discounted Reinforcement Learning with Optimized Certainty Equivalents
Oliver Mortensen, Mohammad Sadegh Talebi

TL;DR
This paper analyzes the sample complexity of risk-sensitive reinforcement learning in finite discounted MDPs using optimized certainty equivalents, providing bounds and characterizations for PAC-learnability.
Contribution
It offers an exact characterization of utility functions for PAC-learnability and derives tight PAC sample complexity bounds for value and policy learning under OCE risk measures.
Findings
PAC-learnability depends on the domain of utility functions.
Derived PAC sample complexity bounds for value and policy learning.
Established lower bounds demonstrating tightness and dependence on horizon and risk parameters.
Abstract
We study risk-sensitive reinforcement learning in finite discounted MDPs, where a generative model of the MDP is assumed to be available. We consider a family or risk measures called the optimized certainty equivalent (OCE), which includes important risk measures such as entropic risk, CVaR, and mean-variance. Our focus is on the sample complexities of learning the optimal state-action value function (value learning) and an optimal policy (policy learning) under recursive OCE. We provide an exact characterization of utility functions for which the corresponding OCE defines an objective that is PAC-learnable. We analyze a simple model-based approach and derive PAC sample complexity bounds. We establish that whenever does not have full domain , the corresponding problem is not PAC-learnable. Finally, we establish corresponding lower bounds for both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
