Actor-Critic or Critic-Actor? A Tale of Two Time Scales

Shalabh Bhatnagar; Vivek S. Borkar; Soumyajit Guin

arXiv:2210.04470·cs.LG·June 21, 2024·1 cites

Actor-Critic or Critic-Actor? A Tale of Two Time Scales

Shalabh Bhatnagar, Vivek S. Borkar, Soumyajit Guin

PDF

Open Access 1 Repo

TL;DR

This paper explores the effects of reversing the time scales in actor-critic algorithms, demonstrating that a critic-actor approach can emulate value iteration, with proven convergence and comparable empirical performance.

Contribution

It introduces and analyzes a reversed time-scale actor-critic algorithm, showing it can emulate value iteration and performs similarly to traditional actor-critic methods.

Findings

01

Reversing time scales emulates value iteration.

02

The proposed critic-actor algorithm converges reliably.

03

Performance is comparable to standard actor-critic in accuracy and efficiency.

Abstract

We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale stochastic approximation with value function computed on a faster time-scale and policy computed on a slower time-scale. This emulates policy iteration. We observe that reversal of the time scales will in fact emulate value iteration and is a legitimate algorithm. We provide a proof of convergence and compare the two empirically with and without function approximation (with both linear and nonlinear function approximators) and observe that our proposed critic-actor algorithm performs on par with actor-critic in terms of both accuracy and computational effort.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gsoumyajit/actor-critic-critic-actor
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics