On the Convergence of Reinforcement Learning with Monte Carlo Exploring   Starts

Jun Liu

arXiv:2007.10916·math.OC·July 22, 2020

On the Convergence of Reinforcement Learning with Monte Carlo Exploring Starts

Jun Liu

PDF

Open Access

TL;DR

This paper investigates the convergence of the Monte Carlo Exploring States reinforcement learning algorithm in stochastic shortest path problems, providing new theoretical insights and a proof of a supermartingale convergence theorem.

Contribution

It establishes convergence results for MCES in undiscounted cost settings and offers a proof of a key supermartingale convergence theorem.

Findings

01

Convergence of MCES in stochastic shortest path problems is proven.

02

Provides a supermartingale convergence theorem relevant to stochastic approximation.

03

Complements existing partial results on reinforcement learning convergence.

Abstract

A basic simulation-based reinforcement learning algorithm is the Monte Carlo Exploring States (MCES) method, also known as optimistic policy iteration, in which the value function is approximated by simulated returns and a greedy policy is selected at each iteration. The convergence of this algorithm in the general setting has been an open question. In this paper, we investigate the convergence of this algorithm for the case with undiscounted costs, also known as the stochastic shortest path problem. The results complement existing partial results on this topic and thereby helps further settle the open problem. As a side result, we also provide a proof of a version of the supermartingale convergence theorem commonly used in stochastic approximation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Computability, Logic, AI Algorithms