Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms

Tengyu Xu; Zhe Wang; Yingbin Liang

arXiv:2004.12956·cs.LG·February 15, 2021·23 cites

Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms

Tengyu Xu, Zhe Wang, Yingbin Liang

PDF

Open Access 1 Video

TL;DR

This paper improves the theoretical understanding of actor-critic algorithms in reinforcement learning, showing they require fewer samples to reach optimality compared to policy gradient methods, especially under Markovian sampling and mini-batch updates.

Contribution

It provides the first convergence rate analysis of AC and NAC under Markovian sampling with mini-batch data, demonstrating significant sample complexity improvements over prior methods.

Findings

01

Sample complexity for AC improves by order () () over previous bounds.

02

Sample complexity for NAC improves by order () () over prior results.

03

AC and NAC outperform PG and NPG in sample efficiency by factors depending on the discount factor ().

Abstract

The actor-critic (AC) algorithm is a popular method to find an optimal policy in reinforcement learning. In the infinite horizon scenario, the finite-sample convergence rate for the AC and natural actor-critic (NAC) algorithms has been established recently, but under independent and identically distributed (i.i.d.) sampling and single-sample update at each iteration. In contrast, this paper characterizes the convergence rate and sample complexity of AC and NAC under Markovian sampling, with mini-batch data for each iteration, and with actor having general policy class approximation. We show that the overall sample complexity for a mini-batch AC to attain an $ϵ$ -accurate stationary point improves the best known sample complexity of AC by an order of $O (ϵ^{- 1} lo g (1/ ϵ))$ , and the overall sample complexity for a mini-batch NAC to attain an $ϵ$ -accurate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Bandit Algorithms Research