Accelerating Primal-dual Methods for Regularized Markov Decision   Processes

Haoya Li; Hsiang-fu Yu; Lexing Ying; and Inderjit Dhillon

arXiv:2202.10506·math.OC·June 13, 2023

Accelerating Primal-dual Methods for Regularized Markov Decision Processes

Haoya Li, Hsiang-fu Yu, Lexing Ying, and Inderjit Dhillon

PDF

Open Access

TL;DR

This paper introduces a quadratically convexified primal-dual formulation for entropy regularized Markov decision processes, achieving faster convergence with a new interpolating metric and demonstrating improved performance through numerical experiments.

Contribution

It presents a novel convexified primal-dual formulation and an accelerated convergence method for entropy regularized MDPs, with theoretical guarantees and empirical validation.

Findings

01

Global convergence guarantee for the new formulation

02

Exponential convergence rate achieved

03

Significant acceleration demonstrated in numerical results

Abstract

Entropy regularized Markov decision processes have been widely used in reinforcement learning. This paper is concerned with the primal-dual formulation of the entropy regularized problems. Standard first-order methods suffer from slow convergence due to the lack of strict convexity and concavity. To address this issue, we first introduce a new quadratically convexified primal-dual formulation. The natural gradient ascent descent of the new formulation enjoys global convergence guarantee and exponential convergence rate. We also propose a new interpolating metric that further accelerates the convergence significantly. Numerical results are provided to demonstrate the performance of the proposed methods under multiple settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Optimization and Variational Analysis · Reinforcement Learning in Robotics