Finite-Sample Analysis of the Monte Carlo Exploring Starts Algorithm for   Reinforcement Learning

Suei-Wen Chen; Keith Ross; Pierre Youssef

arXiv:2410.02994·cs.LG·October 7, 2024

Finite-Sample Analysis of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning

Suei-Wen Chen, Keith Ross, Pierre Youssef

PDF

Open Access

TL;DR

This paper provides a finite-sample analysis of the Monte Carlo Exploring Starts algorithm in reinforcement learning, establishing sample complexity bounds for convergence to an optimal policy in stochastic shortest path problems.

Contribution

It introduces a novel finite-sample bound for a modified MCES algorithm, including a convergence rate analysis for policy iteration in stochastic shortest path settings.

Findings

01

Algorithm returns an optimal policy after $ ilde{O}(SAK^3 ext{log}^3(1/\delta))$ episodes with high probability.

02

Provides the first finite-sample bound for MCES-style algorithms in stochastic shortest path problems.

03

Convergence rate depends on states, actions, episode length proxy, and reward bounds.

Abstract

Monte Carlo Exploring Starts (MCES), which aims to learn the optimal policy using only sample returns, is a simple and natural algorithm in reinforcement learning which has been shown to converge under various conditions. However, the convergence rate analysis for MCES-style algorithms in the form of sample complexity has received very little attention. In this paper we develop a finite sample bound for a modified MCES algorithm which solves the stochastic shortest path problem. To this end, we prove a novel result on the convergence rate of the policy iteration algorithm. This result implies that with probability at least $1 - δ$ , the algorithm returns an optimal policy after $\tilde{O} (S A K^{3} lo g^{3} \frac{1}{δ})$ sampled episodes, where $S$ and $A$ denote the number of states and actions respectively, $K$ is a proxy for episode length, and $\tilde{O}$ hides logarithmic factors…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications · Simulation Techniques and Applications