Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted   Markov Decision Processes

Emmeran Johnson; Ciara Pike-Burke; Patrick Rebeschini

arXiv:2302.11381·math.OC·November 23, 2023

Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

Emmeran Johnson, Ciara Pike-Burke, Patrick Rebeschini

PDF

Open Access 1 Video

TL;DR

None

Contribution

None

Abstract

Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide range of novel and fundamental methods in reinforcement learning. Motivated by the instability of policy iteration (PI) with inexact policy evaluation, PMD algorithmically regularises the policy improvement step of PI. With exact policy evaluation, PI is known to converge linearly with a rate given by the discount factor $γ$ of a Markov Decision Process. In this work, we bridge the gap between PI and PMD with exact policy evaluation and show that the dimension-free $γ$ -rate of PI can be achieved by the general family of unregularised PMD algorithms under an adaptive step-size. We show that both the rate and step-size are unimprovable for PMD: we provide matching lower bounds that demonstrate that the $γ$ -rate is optimal for PMD methods as well as PI, and that the adaptive step-size is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics