Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms
Tengyu Xu, Zhe Wang, Yingbin Liang

TL;DR
This paper establishes the first non-asymptotic convergence rates for two time-scale actor-critic and natural actor-critic algorithms in reinforcement learning, providing sample complexity bounds under Markovian sampling.
Contribution
It introduces finite-sample convergence rates for two time-scale AC and NAC algorithms, which were previously only known to converge asymptotically.
Findings
AC requires $ ilde{O}(rac{1}{\e^{2.5}})$ samples for $\e$-accuracy
NAC requires $ ilde{O}(rac{1}{\e^{4}})$ samples for $\e$-accuracy
Novel techniques for bias and convergence analysis under Markovian sampling
Abstract
As an important type of reinforcement learning algorithms, actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies. In the first nested-loop design, actor's one update of policy is followed by an entire loop of critic's updates of the value function, and the finite-sample analysis of such AC and NAC algorithms have been recently well established. The second two time-scale design, in which actor and critic update simultaneously but with different learning rates, has much fewer tuning parameters than the nested-loop design and is hence substantially easier to implement. Although two time-scale AC and NAC have been shown to converge in the literature, the finite-sample convergence rate has not been established. In this paper, we provide the first such non-asymptotic convergence rate for two time-scale AC and NAC under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Bandit Algorithms Research
