On the Global Convergence of Actor-Critic: A Case for Linear Quadratic   Regulator with Ergodic Cost

Zhuoran Yang; Yongxin Chen; Mingyi Hong; Zhaoran Wang

arXiv:1907.06246·cs.LG·July 16, 2019·27 cites

On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

Zhuoran Yang, Yongxin Chen, Mingyi Hong, Zhaoran Wang

PDF

Open Access

TL;DR

This paper provides a nonasymptotic convergence analysis of the actor-critic algorithm applied to linear quadratic regulators, demonstrating global convergence at a linear rate, advancing theoretical understanding of reinforcement learning algorithms.

Contribution

It establishes the first nonasymptotic global convergence proof for actor-critic in the LQR setting, highlighting its linear convergence rate.

Findings

01

Actor-critic converges globally to the optimal policy and value function.

02

Convergence occurs at a linear rate in the LQR setting.

03

Analysis offers insights into bilevel optimization with nonconvex subproblems.

Abstract

Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence is known to be fragile. To understand the instability of actor-critic, we focus on its application to linear quadratic regulators, a simple yet fundamental setting of reinforcement learning. We establish a nonasymptotic convergence analysis of actor-critic in this setting. In particular, we prove that actor-critic finds a globally optimal pair of actor (policy) and critic (action-value function) at a linear rate of convergence. Our analysis may serve as a preliminary step towards a complete theoretical understanding of bilevel optimization with nonconvex subproblems, which is NP-hard in the worst case and is often solved using heuristics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adaptive Dynamic Programming Control