Towards Understanding Asynchronous Advantage Actor-critic: Convergence and Linear Speedup
Han Shen, Kaiqing Zhang, Mingyi Hong, Tianyi Chen

TL;DR
This paper provides the first theoretical analysis of the convergence and linear speedup of the asynchronous advantage actor-critic (A3C) algorithm, confirming its efficiency and effectiveness in reinforcement learning through rigorous proofs and experiments.
Contribution
It establishes non-asymptotic convergence guarantees for A3C and demonstrates its linear speedup, providing the first theoretical validation of its parallelism benefits.
Findings
A3C achieves sample complexity of O(ε^{-2.5}/N) per worker.
A3C guarantees local and global convergence under certain conditions.
Numerical experiments confirm theoretical results.
Abstract
Asynchronous and parallel implementation of standard reinforcement learning (RL) algorithms is a key enabler of the tremendous success of modern RL. Among many asynchronous RL algorithms, arguably the most popular and effective one is the asynchronous advantage actor-critic (A3C) algorithm. Although A3C is becoming the workhorse of RL, its theoretical properties are still not well-understood, including its non-asymptotic analysis and the performance gain of parallelism (a.k.a. linear speedup). This paper revisits the A3C algorithm and establishes its non-asymptotic convergence guarantees. Under both i.i.d. and Markovian sampling, we establish the local convergence guarantee for A3C in the general policy approximation case and the global convergence guarantee in softmax policy parameterization. Under i.i.d. sampling, A3C obtains sample complexity of per…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElectric Power System Optimization · Global Financial Crisis and Policies · Economic theories and models
MethodsSoftmax · Entropy Regularization · Convolution · Dense Connections · A3C
