A2C is a special case of PPO
Shengyi Huang, Anssi Kanervisto, Antonin Raffin, Weixun Wang, Santiago, Onta\~n\'on, Rousslan Fernand Julien Dossa

TL;DR
This paper demonstrates that Advantage Actor-Critic (A2C) is a special case of Proximal Policy Optimization (PPO), supported by theoretical analysis and empirical evidence showing identical models under controlled settings.
Contribution
It provides a theoretical and empirical link between A2C and PPO, revealing A2C as a specific instance of PPO.
Findings
A2C and PPO produce identical models under controlled conditions.
Theoretical analysis shows A2C is a special case of PPO.
Empirical experiments confirm the theoretical findings.
Abstract
Advantage Actor-critic (A2C) and Proximal Policy Optimization (PPO) are popular deep reinforcement learning algorithms used for game AI in recent years. A common understanding is that A2C and PPO are separate algorithms because PPO's clipped objective appears significantly different than A2C's objective. In this paper, however, we show A2C is a special case of PPO. We present theoretical justifications and pseudocode analysis to demonstrate why. To validate our claim, we conduct an empirical experiment using \texttt{Stable-baselines3}, showing A2C and PPO produce the \textit{exact} same models when other settings are controlled.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Advanced Bandit Algorithms Research
MethodsEntropy Regularization · A2C · Proximal Policy Optimization
