A Closer Look at Deep Policy Gradients

Andrew Ilyas; Logan Engstrom; Shibani Santurkar; Dimitris Tsipras,; Firdaus Janoos; Larry Rudolph; Aleksander Madry

arXiv:1811.02553·cs.LG·May 26, 2020·27 cites

A Closer Look at Deep Policy Gradients

Andrew Ilyas, Logan Engstrom, Shibani Santurkar, Dimitris Tsipras,, Firdaus Janoos, Larry Rudolph, Aleksander Madry

PDF

Open Access

TL;DR

This paper critically examines deep policy gradient algorithms, revealing discrepancies between their theoretical motivations and actual behavior, and emphasizing the need for improved understanding and evaluation methods.

Contribution

It provides a detailed analysis showing that current deep policy gradient methods often deviate from their theoretical assumptions, highlighting gaps in understanding and evaluation.

Findings

01

Surrogate objectives do not align with true reward landscapes

02

Learned value estimators often fail to accurately predict true values

03

Gradient estimates poorly correlate with true gradients

Abstract

We study how the behavior of deep policy gradient algorithms reflects the conceptual framework motivating their development. To this end, we propose a fine-grained analysis of state-of-the-art methods based on key elements of this framework: gradient estimation, value prediction, and optimization landscapes. Our results show that the behavior of deep policy gradient algorithms often deviates from what their motivating framework would predict: the surrogate objective does not match the true reward landscape, learned value estimators fail to fit the true value function, and gradient estimates poorly correlate with the "true" gradient. The mismatch between predicted and empirical behavior we uncover highlights our poor understanding of current methods, and indicates the need to move beyond current benchmark-centric evaluation methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications