An Empirical Analysis of Proximal Policy Optimization with   Kronecker-factored Natural Gradients

Jiaming Song; Yuhuai Wu

arXiv:1801.05566·cs.AI·January 18, 2018·2 cites

An Empirical Analysis of Proximal Policy Optimization with Kronecker-factored Natural Gradients

Jiaming Song, Yuhuai Wu

PDF

Open Access

TL;DR

This paper empirically evaluates PPO combined with K-FAC natural gradient optimization, showing improvements in sample efficiency and training speed over PPO in MuJoCo environments, but with some limitations in scalability and epoch sensitivity.

Contribution

It introduces PPOKFAC, a novel combination of PPO and K-FAC natural gradients, and provides comprehensive empirical analysis of its performance and limitations.

Findings

01

PPOKFAC outperforms PPO in sample complexity and speed.

02

PPOKFAC is scalable with batch size.

03

Adding more epochs does not improve sample efficiency.

Abstract

In this technical report, we consider an approach that combines the PPO objective and K-FAC natural gradient optimization, for which we call PPOKFAC. We perform a range of empirical analysis on various aspects of the algorithm, such as sample complexity, training speed, and sensitivity to batch size and training epochs. We observe that PPOKFAC is able to outperform PPO in terms of sample complexity and speed in a range of MuJoCo environments, while being scalable in terms of batch size. In spite of this, it seems that adding more epochs is not necessarily helpful for sample efficiency, and PPOKFAC seems to be worse than its A2C counterpart, ACKTR.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Reinforcement Learning in Robotics · Advanced Neural Network Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Entropy Regularization · A2C · Proximal Policy Optimization