A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

Shangtong Zhang; Romain Laroche; Harm van Seijen; Shimon Whiteson,; Remi Tachet des Combes

arXiv:2010.01069·cs.LG·January 27, 2022·5 cites

A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson,, Remi Tachet des Combes

PDF

Open Access 1 Repo

TL;DR

This paper examines the mismatch in discounting practices between actor and critic in actor-critic algorithms, analyzing its implications from representation learning and bias-variance trade-offs, supported by empirical evidence.

Contribution

It offers a theoretical and empirical analysis of discounting mismatch in actor-critic algorithms, proposing new interpretations for both undiscounted and discounted objectives.

Findings

01

Discounting mismatch affects bias-variance trade-off in critic.

02

Omission of discounting in actor can be viewed as an auxiliary task.

03

Empirical results support the proposed interpretations.

Abstract

We investigate the discounting mismatch in actor-critic algorithm implementations from a representation learning perspective. Theoretically, actor-critic algorithms usually have discounting for both actor and critic, i.e., there is a $γ^{t}$ term in the actor update for the transition observed at time $t$ in a trajectory and the critic is a discounted value function. Practitioners, however, usually ignore the discounting ( $γ^{t}$ ) for the actor while using a discounted critic. We investigate this mismatch in two scenarios. In the first scenario, we consider optimizing an undiscounted objective $(γ = 1)$ where $γ^{t}$ disappears naturally $(1^{t} = 1)$ . We then propose to interpret the discounting in critic in terms of a bias-variance-representation trade-off and provide supporting empirical results. In the second scenario, we consider optimizing a discounted objective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ShangtongZhang/DeepRL
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Adversarial Robustness in Machine Learning