On the Convergence of Reinforcement Learning with Monte Carlo Exploring Starts
Jun Liu

TL;DR
This paper investigates the convergence of the Monte Carlo Exploring States reinforcement learning algorithm in stochastic shortest path problems, providing new theoretical insights and a proof of a supermartingale convergence theorem.
Contribution
It establishes convergence results for MCES in undiscounted cost settings and offers a proof of a key supermartingale convergence theorem.
Findings
Convergence of MCES in stochastic shortest path problems is proven.
Provides a supermartingale convergence theorem relevant to stochastic approximation.
Complements existing partial results on reinforcement learning convergence.
Abstract
A basic simulation-based reinforcement learning algorithm is the Monte Carlo Exploring States (MCES) method, also known as optimistic policy iteration, in which the value function is approximated by simulated returns and a greedy policy is selected at each iteration. The convergence of this algorithm in the general setting has been an open question. In this paper, we investigate the convergence of this algorithm for the case with undiscounted costs, also known as the stochastic shortest path problem. The results complement existing partial results on this topic and thereby helps further settle the open problem. As a side result, we also provide a proof of a version of the supermartingale convergence theorem commonly used in stochastic approximation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Computability, Logic, AI Algorithms
