An Empirical Analysis of Proximal Policy Optimization with Kronecker-factored Natural Gradients
Jiaming Song, Yuhuai Wu

TL;DR
This paper empirically evaluates PPO combined with K-FAC natural gradient optimization, showing improvements in sample efficiency and training speed over PPO in MuJoCo environments, but with some limitations in scalability and epoch sensitivity.
Contribution
It introduces PPOKFAC, a novel combination of PPO and K-FAC natural gradients, and provides comprehensive empirical analysis of its performance and limitations.
Findings
PPOKFAC outperforms PPO in sample complexity and speed.
PPOKFAC is scalable with batch size.
Adding more epochs does not improve sample efficiency.
Abstract
In this technical report, we consider an approach that combines the PPO objective and K-FAC natural gradient optimization, for which we call PPOKFAC. We perform a range of empirical analysis on various aspects of the algorithm, such as sample complexity, training speed, and sensitivity to batch size and training epochs. We observe that PPOKFAC is able to outperform PPO in terms of sample complexity and speed in a range of MuJoCo environments, while being scalable in terms of batch size. In spite of this, it seems that adding more epochs is not necessarily helpful for sample efficiency, and PPOKFAC seems to be worse than its A2C counterpart, ACKTR.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Reinforcement Learning in Robotics · Advanced Neural Network Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Entropy Regularization · A2C · Proximal Policy Optimization
