Implementation Matters in Deep Policy Gradients: A Case Study on PPO and   TRPO

Logan Engstrom; Andrew Ilyas; Shibani Santurkar; Dimitris Tsipras,; Firdaus Janoos; Larry Rudolph; Aleksander Madry

arXiv:2005.12729·cs.LG·May 27, 2020·137 cites

Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO

Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras,, Firdaus Janoos, Larry Rudolph, Aleksander Madry

PDF

Open Access 3 Repos

TL;DR

This paper demonstrates that code-level optimizations significantly influence the performance and behavior of deep policy gradient algorithms, specifically PPO and TRPO, highlighting challenges in attributing RL progress.

Contribution

It reveals that implementation details are crucial and often responsible for performance differences in deep RL algorithms, emphasizing the importance of careful attribution.

Findings

01

Code optimizations account for most of PPO's reward gains over TRPO.

02

Implementation details fundamentally alter how RL algorithms function.

03

Performance improvements are often due to auxiliary implementation choices.

Abstract

We study the roots of algorithmic progress in deep policy gradient algorithms through a case study on two popular algorithms: Proximal Policy Optimization (PPO) and Trust Region Policy Optimization (TRPO). Specifically, we investigate the consequences of "code-level optimizations:" algorithm augmentations found only in implementations or described as auxiliary details to the core algorithm. Seemingly of secondary importance, such optimizations turn out to have a major impact on agent behavior. Our results show that they (a) are responsible for most of PPO's gain in cumulative reward over TRPO, and (b) fundamentally change how RL methods function. These insights show the difficulty and importance of attributing performance gains in deep reinforcement learning. Code for reproducing our results is available at https://github.com/MadryLab/implementation-matters .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Memory and Neural Computing

MethodsEntropy Regularization · Proximal Policy Optimization · Trust Region Policy Optimization