Non-asymptotic Convergence Analysis of Two Time-scale (Natural)   Actor-Critic Algorithms

Tengyu Xu; Zhe Wang; Yingbin Liang

arXiv:2005.03557·cs.LG·May 11, 2020·34 cites

Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms

Tengyu Xu, Zhe Wang, Yingbin Liang

PDF

Open Access

TL;DR

This paper establishes the first non-asymptotic convergence rates for two time-scale actor-critic and natural actor-critic algorithms in reinforcement learning, providing sample complexity bounds under Markovian sampling.

Contribution

It introduces finite-sample convergence rates for two time-scale AC and NAC algorithms, which were previously only known to converge asymptotically.

Findings

01

AC requires $ ilde{O}(rac{1}{\e^{2.5}})$ samples for $\e$-accuracy

02

NAC requires $ ilde{O}(rac{1}{\e^{4}})$ samples for $\e$-accuracy

03

Novel techniques for bias and convergence analysis under Markovian sampling

Abstract

As an important type of reinforcement learning algorithms, actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies. In the first nested-loop design, actor's one update of policy is followed by an entire loop of critic's updates of the value function, and the finite-sample analysis of such AC and NAC algorithms have been recently well established. The second two time-scale design, in which actor and critic update simultaneously but with different learning rates, has much fewer tuning parameters than the nested-loop design and is hence substantially easier to implement. Although two time-scale AC and NAC have been shown to converge in the literature, the finite-sample convergence rate has not been established. In this paper, we provide the first such non-asymptotic convergence rate for two time-scale AC and NAC under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Bandit Algorithms Research