A2C is a special case of PPO

Shengyi Huang; Anssi Kanervisto; Antonin Raffin; Weixun Wang; Santiago; Onta\~n\'on; Rousslan Fernand Julien Dossa

arXiv:2205.09123·cs.LG·May 20, 2022

A2C is a special case of PPO

Shengyi Huang, Anssi Kanervisto, Antonin Raffin, Weixun Wang, Santiago, Onta\~n\'on, Rousslan Fernand Julien Dossa

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that Advantage Actor-Critic (A2C) is a special case of Proximal Policy Optimization (PPO), supported by theoretical analysis and empirical evidence showing identical models under controlled settings.

Contribution

It provides a theoretical and empirical link between A2C and PPO, revealing A2C as a specific instance of PPO.

Findings

01

A2C and PPO produce identical models under controlled conditions.

02

Theoretical analysis shows A2C is a special case of PPO.

03

Empirical experiments confirm the theoretical findings.

Abstract

Advantage Actor-critic (A2C) and Proximal Policy Optimization (PPO) are popular deep reinforcement learning algorithms used for game AI in recent years. A common understanding is that A2C and PPO are separate algorithms because PPO's clipped objective appears significantly different than A2C's objective. In this paper, however, we show A2C is a special case of PPO. We present theoretical justifications and pseudocode analysis to demonstrate why. To validate our claim, we conduct an empirical experiment using \texttt{Stable-baselines3}, showing A2C and PPO produce the \textit{exact} same models when other settings are controlled.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vwxyzjn/a2c_is_a_special_case_of_ppo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Advanced Bandit Algorithms Research

MethodsEntropy Regularization · A2C · Proximal Policy Optimization