On the Convergence Rate of Off-Policy Policy Optimization Methods with   Density-Ratio Correction

Jiawei Huang; Nan Jiang

arXiv:2106.00993·cs.LG·February 15, 2022·1 cites

On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction

Jiawei Huang, Nan Jiang

PDF

Open Access

TL;DR

This paper analyzes the convergence rates of off-policy policy optimization algorithms with density-ratio correction, proposing two strategies with finite-time guarantees and optimal or near-optimal convergence rates.

Contribution

It introduces two new algorithms, P-SREDA and O-SPIM, with proven convergence rates for off-policy policy improvement under function approximation.

Findings

01

P-SREDA has an optimal convergence rate of $O(psilon^{-3})$.

02

O-SPIM converges to a stationary point with rate $O(psilon^{-4})$.

03

The methods provide finite-time convergence guarantees for off-policy algorithms.

Abstract

In this paper, we study the convergence properties of off-policy policy improvement algorithms with state-action density ratio correction under function approximation setting, where the objective function is formulated as a max-max-min optimization problem. We characterize the bias of the learning objective and present two strategies with finite-time convergence guarantees. In our first strategy, we present algorithm P-SREDA with convergence rate $O (ϵ^{- 3})$ , whose dependency on $ϵ$ is optimal. In our second strategy, we propose a new off-policy actor-critic style algorithm named O-SPIM. We prove that O-SPIM converges to a stationary point with total complexity $O (ϵ^{- 4})$ , which matches the convergence rate of some recent actor-critic algorithms in the on-policy setting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Machine Learning and Algorithms