Optimal Posterior Sampling for Policy Identification in Tabular Markov Decision Processes

Cyrille Kone; Kevin Jamieson

arXiv:2605.03921·cs.LG·May 6, 2026

Optimal Posterior Sampling for Policy Identification in Tabular Markov Decision Processes

Cyrille Kone, Kevin Jamieson

PDF

TL;DR

This paper introduces a computationally efficient, asymptotically optimal algorithm for policy identification in tabular MDPs, improving upon prior methods in sample complexity and dependence on confidence parameters.

Contribution

Proposes a novel randomized posterior sampling algorithm that achieves asymptotic optimality and practical efficiency for policy identification in finite-horizon MDPs.

Findings

01

Achieves asymptotic optimality in sample complexity.

02

Runs in $O(S^2AH)$ per episode, matching standard approaches.

03

Guarantees remain meaningful in the asymptotic regime, avoiding sub-optimal dependence on $ ext{log}(1/ ext{delta})$.

Abstract

We study the $(ε, δ)$ -PAC policy identification problem in finite-horizon episodic Markov Decision Processes. Existing approaches provide finite-time guarantees for approximate settings ( $ε > 0$ ) but suffer from high computational cost, rendering them hard to implement, and also suffer from suboptimal dependence on $lo g (1/ δ)$ . We propose a randomized and computationally efficient algorithm for best policy identification that combines posterior sampling with an online learning algorithm to guide exploration in the MDP. Our method achieves asymptotic optimality in sample complexity, also in terms of posterior contraction rate, and runs in $O (S^{2} A H)$ per episode, matching standard model-based approaches. Unlike prior algorithms such as MOCA and PEDEL, our guarantees remain meaningful in the asymptotic regime and avoid sub-optimal polynomial dependence on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.