On the Convergence of Single-Timescale Actor-Critic
Navdeep Kumar, Priyank Agrawal, Giorgia Ramponi, Kfir Yehuda Levy, Shie Mannor

TL;DR
This paper proves that a single-timescale actor-critic algorithm for MDPs converges globally to an optimal policy with improved sample complexity, requiring specific step size decay rates.
Contribution
It introduces a new analytical framework and establishes the first global convergence proof with improved sample complexity for single-timescale AC algorithms.
Findings
Converges to a globally optimal policy with sample complexity O(ε^{-3})
Requires actor and critic step sizes to decay as O(k^{-2/3})
Improves upon previous complexity bounds for actor-critic algorithms
Abstract
We analyze the global convergence of the single-timescale actor-critic (AC) algorithm for the infinite-horizon discounted Markov Decision Processes (MDPs) with finite state spaces. To this end, we introduce an elegant analytical framework for handling complex, coupled recursions inherent in the algorithm. Leveraging this framework, we establish that the algorithm converges to an -close \textbf{globally optimal} policy with a sample complexity of \( O(\epsilon^{-3}) \). This significantly improves upon the existing complexity of to achieve -close \textbf{stationary policy}, which is equivalent to the complexity of to achieve -close \textbf{globally optimal} policy using gradient domination lemma. Furthermore, we demonstrate that to achieve this improvement, the step sizes for both the actor and critic must decay as \(…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Neural Networks and Reservoir Computing · Computability, Logic, AI Algorithms
