Recursive Entropic Risk Optimization in Discounted MDPs: Sample Complexity Bounds with a Generative Model

Oliver Mortensen; Mohammad Sadegh Talebi

arXiv:2506.00286·cs.LG·May 20, 2026

Recursive Entropic Risk Optimization in Discounted MDPs: Sample Complexity Bounds with a Generative Model

Oliver Mortensen, Mohammad Sadegh Talebi

PDF

TL;DR

This paper analyzes the sample complexity of risk-sensitive reinforcement learning in finite discounted MDPs with recursive entropic risk measures, providing tight bounds and a new model-based algorithm.

Contribution

It introduces a novel model-based algorithm for recursive ERM in MDPs and establishes tight PAC bounds on sample complexity for both risk-averse and risk-seeking cases.

Findings

01

Sample complexity bounds scale exponentially with risk parameter and discount factor.

02

Lower bounds show exponential dependence is unavoidable.

03

First rigorous guarantees for recursive ERM in both risk regimes.

Abstract

We study risk-sensitive reinforcement learning in finite discounted MDPs with recursive entropic risk measures (ERM), where the risk parameter $β \neq = 0$ controls the agent's risk attitude: $β > 0$ for risk-averse and $β < 0$ for risk-seeking behavior. A generative model of the MDP is assumed to be available. Our focus is on the sample complexities of learning the optimal state-action value function (value learning) and an optimal policy (policy learning) under recursive ERM. We introduce a model-based algorithm, called Model-Based ERM $Q$ -Value Iteration (MB-RS-QVI), and derive PAC-type bounds on its sample complexity for both value and policy learning. Both PAC bounds scale exponentially with $∣ β ∣/ (1 - γ)$ , where $γ$ is the discount factor. We also establish corresponding lower bounds for both value and policy learning, showing that exponential dependence on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Portfolio Optimization · Reservoir Engineering and Simulation Methods · Water resources management and optimization