Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees

Hsin-En Su; Yen-Ju Chen; Ping-Chun Hsieh; Xi Liu

arXiv:2212.05237·cs.LG·December 13, 2022

Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees

Hsin-En Su, Yen-Ju Chen, Ping-Chun Hsieh, Xi Liu

PDF

Open Access

TL;DR

This paper introduces CAPO, a coordinate ascent-based off-policy RL algorithm that guarantees global convergence without requiring distribution correction, and demonstrates its effectiveness with neural policies.

Contribution

The paper proposes CAPO, an off-policy actor-critic method that avoids distribution mismatch issues and provides theoretical convergence guarantees.

Findings

01

CAPO converges globally under general coordinate selection.

02

CAPO achieves competitive performance in experiments.

03

Extended CAPO to neural policies for practical use.

Abstract

We revisit the domain of off-policy policy optimization in RL from the perspective of coordinate ascent. One commonly-used approach is to leverage the off-policy policy gradient to optimize a surrogate objective -- the total discounted in expectation return of the target policy with respect to the state distribution of the behavior policy. However, this approach has been shown to suffer from the distribution mismatch issue, and therefore significant efforts are needed for correcting this mismatch either via state distribution correction or a counterfactual method. In this paper, we rethink off-policy learning via Coordinate Ascent Policy Optimization (CAPO), an off-policy actor-critic algorithm that decouples policy improvement from the state distribution of the behavior policy without using the policy gradient. This design obviates the need for distribution correction or importance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics