A Closer Look at Deep Policy Gradients
Andrew Ilyas, Logan Engstrom, Shibani Santurkar, Dimitris Tsipras,, Firdaus Janoos, Larry Rudolph, Aleksander Madry

TL;DR
This paper critically examines deep policy gradient algorithms, revealing discrepancies between their theoretical motivations and actual behavior, and emphasizing the need for improved understanding and evaluation methods.
Contribution
It provides a detailed analysis showing that current deep policy gradient methods often deviate from their theoretical assumptions, highlighting gaps in understanding and evaluation.
Findings
Surrogate objectives do not align with true reward landscapes
Learned value estimators often fail to accurately predict true values
Gradient estimates poorly correlate with true gradients
Abstract
We study how the behavior of deep policy gradient algorithms reflects the conceptual framework motivating their development. To this end, we propose a fine-grained analysis of state-of-the-art methods based on key elements of this framework: gradient estimation, value prediction, and optimization landscapes. Our results show that the behavior of deep policy gradient algorithms often deviates from what their motivating framework would predict: the surrogate objective does not match the true reward landscape, learned value estimators fail to fit the true value function, and gradient estimates poorly correlate with the "true" gradient. The mismatch between predicted and empirical behavior we uncover highlights our poor understanding of current methods, and indicates the need to move beyond current benchmark-centric evaluation methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
