Towards Understanding Asynchronous Advantage Actor-critic: Convergence   and Linear Speedup

Han Shen; Kaiqing Zhang; Mingyi Hong; Tianyi Chen

arXiv:2012.15511·cs.LG·August 2, 2023·6 cites

Towards Understanding Asynchronous Advantage Actor-critic: Convergence and Linear Speedup

Han Shen, Kaiqing Zhang, Mingyi Hong, Tianyi Chen

PDF

Open Access

TL;DR

This paper provides the first theoretical analysis of the convergence and linear speedup of the asynchronous advantage actor-critic (A3C) algorithm, confirming its efficiency and effectiveness in reinforcement learning through rigorous proofs and experiments.

Contribution

It establishes non-asymptotic convergence guarantees for A3C and demonstrates its linear speedup, providing the first theoretical validation of its parallelism benefits.

Findings

01

A3C achieves sample complexity of O(ε^{-2.5}/N) per worker.

02

A3C guarantees local and global convergence under certain conditions.

03

Numerical experiments confirm theoretical results.

Abstract

Asynchronous and parallel implementation of standard reinforcement learning (RL) algorithms is a key enabler of the tremendous success of modern RL. Among many asynchronous RL algorithms, arguably the most popular and effective one is the asynchronous advantage actor-critic (A3C) algorithm. Although A3C is becoming the workhorse of RL, its theoretical properties are still not well-understood, including its non-asymptotic analysis and the performance gain of parallelism (a.k.a. linear speedup). This paper revisits the A3C algorithm and establishes its non-asymptotic convergence guarantees. Under both i.i.d. and Markovian sampling, we establish the local convergence guarantee for A3C in the general policy approximation case and the global convergence guarantee in softmax policy parameterization. Under i.i.d. sampling, A3C obtains sample complexity of $O (ϵ^{- 2.5} / N)$ per…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElectric Power System Optimization · Global Financial Crisis and Policies · Economic theories and models

MethodsSoftmax · Entropy Regularization · Convolution · Dense Connections · A3C