Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms
Tengyu Xu, Zhe Wang, Yingbin Liang

TL;DR
This paper improves the theoretical understanding of actor-critic algorithms in reinforcement learning, showing they require fewer samples to reach optimality compared to policy gradient methods, especially under Markovian sampling and mini-batch updates.
Contribution
It provides the first convergence rate analysis of AC and NAC under Markovian sampling with mini-batch data, demonstrating significant sample complexity improvements over prior methods.
Findings
Sample complexity for AC improves by order () () over previous bounds.
Sample complexity for NAC improves by order () () over prior results.
AC and NAC outperform PG and NPG in sample efficiency by factors depending on the discount factor ().
Abstract
The actor-critic (AC) algorithm is a popular method to find an optimal policy in reinforcement learning. In the infinite horizon scenario, the finite-sample convergence rate for the AC and natural actor-critic (NAC) algorithms has been established recently, but under independent and identically distributed (i.i.d.) sampling and single-sample update at each iteration. In contrast, this paper characterizes the convergence rate and sample complexity of AC and NAC under Markovian sampling, with mini-batch data for each iteration, and with actor having general policy class approximation. We show that the overall sample complexity for a mini-batch AC to attain an -accurate stationary point improves the best known sample complexity of AC by an order of , and the overall sample complexity for a mini-batch NAC to attain an -accurate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Bandit Algorithms Research
