Recursive Entropic Risk Optimization in Discounted MDPs: Sample Complexity Bounds with a Generative Model
Oliver Mortensen, Mohammad Sadegh Talebi

TL;DR
This paper analyzes the sample complexity of risk-sensitive reinforcement learning in finite discounted MDPs with recursive entropic risk measures, providing tight bounds and a new model-based algorithm.
Contribution
It introduces a novel model-based algorithm for recursive ERM in MDPs and establishes tight PAC bounds on sample complexity for both risk-averse and risk-seeking cases.
Findings
Sample complexity bounds scale exponentially with risk parameter and discount factor.
Lower bounds show exponential dependence is unavoidable.
First rigorous guarantees for recursive ERM in both risk regimes.
Abstract
We study risk-sensitive reinforcement learning in finite discounted MDPs with recursive entropic risk measures (ERM), where the risk parameter controls the agent's risk attitude: for risk-averse and for risk-seeking behavior. A generative model of the MDP is assumed to be available. Our focus is on the sample complexities of learning the optimal state-action value function (value learning) and an optimal policy (policy learning) under recursive ERM. We introduce a model-based algorithm, called Model-Based ERM -Value Iteration (MB-RS-QVI), and derive PAC-type bounds on its sample complexity for both value and policy learning. Both PAC bounds scale exponentially with , where is the discount factor. We also establish corresponding lower bounds for both value and policy learning, showing that exponential dependence on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Portfolio Optimization · Reservoir Engineering and Simulation Methods · Water resources management and optimization
