Tight Sample Complexity Bounds for Entropic Best Policy Identification

Amer Essakine; Claire Vernade

arXiv:2605.13717·cs.LG·May 14, 2026

Tight Sample Complexity Bounds for Entropic Best Policy Identification

Amer Essakine, Claire Vernade

PDF

TL;DR

This paper improves the sample complexity bounds for entropic risk-sensitive policy identification in reinforcement learning, closing the exponential gap between existing lower and upper bounds by leveraging sharper concentration bounds and a new stopping rule.

Contribution

It introduces a forward-model based algorithm with KL-based exploration bonuses and novel technical innovations to match the lower bound on sample complexity.

Findings

01

Achieves a sample complexity bound that matches the lower bound.

02

Uses sharper concentration bounds derived from the exponential utility's smoothness.

03

Proposes a new stopping rule to exploit the tightness of the bounds.

Abstract

We study best-policy identification for finite-horizon risk-sensitive reinforcement learning under the entropic risk measure. Recent work established a constant gap in the exponential horizon dependence between lower and upper bounds on the number of samples required to identify an approximately optimal policy. Precisely, known lower bounds scale in $Ω (e^{∣ β ∣ H})$ where $H$ is the horizon of the MDP, while the state-of-the-art upper bound achieves at best $O (e^{2∣ β ∣ H})$ (arXiv:2506.00286v2) using a generative model. We show that this extra exponential factor can be traced to overly loose concentration control for exponential utilities. To close this open gap, we revisit the analysis of this problem through a forward-model based algorithm building on KL-based exploration bonuses that we adapt to the entropic criterion. The improvement we get is due to two main novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.