Block Policy Mirror Descent

Guanghui Lan; Yan Li; Tuo Zhao

arXiv:2201.05756·cs.LG·September 20, 2022

Block Policy Mirror Descent

Guanghui Lan, Yan Li, Tuo Zhao

PDF

Open Access

TL;DR

This paper introduces the block policy mirror descent (BPMD) method for reinforcement learning, offering efficient, convergent policy updates with theoretical guarantees and extensions to stochastic settings, advancing large-scale RL solutions.

Contribution

The paper develops and analyzes the first block coordinate descent methods for policy optimization in reinforcement learning, providing convergence guarantees and efficiency improvements.

Findings

01

BPMD achieves fast linear convergence to the global optimum.

02

Uniform sampling yields comparable complexity to batch methods.

03

Hybrid sampling schemes can accelerate convergence depending on the problem instance.

Abstract

In this paper, we present a new policy gradient (PG) methods, namely the block policy mirror descent (BPMD) method for solving a class of regularized reinforcement learning (RL) problems with (strongly)-convex regularizers. Compared to the traditional PG methods with a batch update rule, which visits and updates the policy for every state, BPMD method has cheap per-iteration computation via a partial update rule that performs the policy update on a sampled state. Despite the nonconvex nature of the problem and a partial update rule, we provide a unified analysis for several sampling schemes, and show that BPMD achieves fast linear convergence to the global optimality. In particular, uniform sampling leads to comparable worst-case total computational complexity as batch PG methods. A necessary and sufficient condition for convergence with on-policy sampling is also identified. With a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques