Analysis of a Target-Based Actor-Critic Algorithm with Linear Function Approximation
Anas Barakat, Pascal Bianchi, Julien Lehmann

TL;DR
This paper provides the first theoretical analysis of a target-based actor-critic algorithm with linear function approximation, explaining its convergence and finite-time behavior in reinforcement learning.
Contribution
It introduces a novel theoretical framework for understanding target networks in actor-critic algorithms with linear approximation.
Findings
Proves asymptotic convergence of the algorithm.
Provides finite-time performance bounds.
Highlights the impact of target networks on learning stability.
Abstract
Actor-critic methods integrating target networks have exhibited a stupendous empirical success in deep reinforcement learning. However, a theoretical understanding of the use of target networks in actor-critic methods is largely missing in the literature. In this paper, we reduce this gap between theory and practice by proposing the first theoretical analysis of an online target-based actor-critic algorithm with linear function approximation in the discounted reward setting. Our algorithm uses three different timescales: one for the actor and two for the critic. Instead of using the standard single timescale temporal difference (TD) learning algorithm as a critic, we use a two timescales target-based version of TD learning closely inspired from practical actor-critic algorithms implementing target networks. First, we establish asymptotic convergence results for both the critic and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Model Reduction and Neural Networks
