Analysis of a Target-Based Actor-Critic Algorithm with Linear Function   Approximation

Anas Barakat; Pascal Bianchi; Julien Lehmann

arXiv:2106.07472·cs.LG·February 24, 2022·1 cites

Analysis of a Target-Based Actor-Critic Algorithm with Linear Function Approximation

Anas Barakat, Pascal Bianchi, Julien Lehmann

PDF

Open Access

TL;DR

This paper provides the first theoretical analysis of a target-based actor-critic algorithm with linear function approximation, explaining its convergence and finite-time behavior in reinforcement learning.

Contribution

It introduces a novel theoretical framework for understanding target networks in actor-critic algorithms with linear approximation.

Findings

01

Proves asymptotic convergence of the algorithm.

02

Provides finite-time performance bounds.

03

Highlights the impact of target networks on learning stability.

Abstract

Actor-critic methods integrating target networks have exhibited a stupendous empirical success in deep reinforcement learning. However, a theoretical understanding of the use of target networks in actor-critic methods is largely missing in the literature. In this paper, we reduce this gap between theory and practice by proposing the first theoretical analysis of an online target-based actor-critic algorithm with linear function approximation in the discounted reward setting. Our algorithm uses three different timescales: one for the actor and two for the critic. Instead of using the standard single timescale temporal difference (TD) learning algorithm as a critic, we use a two timescales target-based version of TD learning closely inspired from practical actor-critic algorithms implementing target networks. First, we establish asymptotic convergence results for both the critic and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Model Reduction and Neural Networks