Achieving $\epsilon^{-2}$ Sample Complexity for Single-Loop Actor-Critic under Minimal Assumptions
Ishaq Hamza, Zaiwei Chen

TL;DR
This paper proves the first $ ilde{ ext{O}}( ext{epsilon}^{-2})$ sample complexity for off-policy actor-critic methods with a single-loop, single-timescale implementation under minimal assumptions, using a novel Lyapunov drift analysis.
Contribution
It introduces a new analytical framework for coupled Lyapunov drift to establish convergence rates for single-loop actor-critic algorithms under minimal assumptions.
Findings
First $ ilde{ ext{O}}( ext{epsilon}^{-2})$ sample complexity for off-policy actor-critic.
Geometric convergence rate for the actor and $ ilde{ ext{O}}(1/T)$ for the critic.
Analysis applicable to other coupled iterative algorithms with unbounded iterates.
Abstract
In this paper, we establish last-iterate convergence rates for off-policy actor--critic methods in reinforcement learning. In particular, under a single-loop, single-timescale implementation and a broad class of policy updates, including approximate policy iteration and natural policy gradient methods, we prove the first sample complexity guarantee for finding an -optimal policy under minimal assumptions, namely, the existence of a policy that induces an irreducible Markov chain. This stands in stark contrast to the existing literature, where an sample complexity is achieved only through nested-loop updates and/or under strong, algorithm-dependent assumptions on the policies, such as uniform mixing and uniform exploration. Technically, to address the challenges posed by the coupled update equations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
