Generalized Proximal Policy Optimization with Sample Reuse

James Queeney; Ioannis Ch. Paschalidis; Christos G. Cassandras

arXiv:2111.00072·cs.LG·November 2, 2021·21 cites

Generalized Proximal Policy Optimization with Sample Reuse

James Queeney, Ioannis Ch. Paschalidis, Christos G. Cassandras

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a new reinforcement learning algorithm that combines the stability of on-policy methods with the sample efficiency of off-policy methods, supported by theoretical guarantees and empirical results.

Contribution

It develops a theoretically grounded off-policy version of PPO, called Generalized Proximal Policy Optimization with Sample Reuse, balancing stability and efficiency.

Findings

01

Improved performance over traditional PPO.

02

Theoretically supported policy improvement guarantees.

03

Effective sample reuse in off-policy setting.

Abstract

In real-world decision making tasks, it is critical for data-driven reinforcement learning methods to be both stable and sample efficient. On-policy methods typically generate reliable policy improvement throughout training, while off-policy methods make more efficient use of data through sample reuse. In this work, we combine the theoretically supported stability benefits of on-policy algorithms with the sample efficiency of off-policy algorithms. We develop policy improvement guarantees that are suitable for the off-policy setting, and connect these bounds to the clipping mechanism used in Proximal Policy Optimization. This motivates an off-policy version of the popular algorithm that we call Generalized Proximal Policy Optimization with Sample Reuse. We demonstrate both theoretically and empirically that our algorithm delivers improved performance by effectively balancing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jqueeney/geppo
tfOfficial

Videos

Generalized Proximal Policy Optimization with Sample Reuse· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Machine Learning and Data Classification