PTR-PPO: Proximal Policy Optimization with Prioritized Trajectory Replay

Xingxing Liang; Yang Ma; Yanghe Feng; Zhong Liu

arXiv:2112.03798·cs.LG·December 9, 2021·5 cites

PTR-PPO: Proximal Policy Optimization with Prioritized Trajectory Replay

Xingxing Liang, Yang Ma, Yanghe Feng, Zhong Liu

PDF

Open Access

TL;DR

PTR-PPO enhances on-policy reinforcement learning by integrating prioritized trajectory replay, combining on-policy and off-policy methods to improve data efficiency and achieve state-of-the-art results in Atari tasks.

Contribution

This paper introduces PTR-PPO, a novel algorithm that incorporates prioritized trajectory replay into PPO, with new priority measures and variance reduction techniques.

Findings

01

Achieves state-of-the-art performance on Atari tasks.

02

Memory size and rollout length significantly affect priority distribution.

03

Prioritized replay improves sampling efficiency in reinforcement learning.

Abstract

On-policy deep reinforcement learning algorithms have low data utilization and require significant experience for policy improvement. This paper proposes a proximal policy optimization algorithm with prioritized trajectory replay (PTR-PPO) that combines on-policy and off-policy methods to improve sampling efficiency by prioritizing the replay of trajectories generated by old policies. We first design three trajectory priorities based on the characteristics of trajectories: the first two being max and mean trajectory priorities based on one-step empirical generalized advantage estimation (GAE) values and the last being reward trajectory priorities based on normalized undiscounted cumulative reward. Then, we incorporate the prioritized trajectory replay into the PPO algorithm, propose a truncated importance weight method to overcome the high variance caused by large importance weights…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsEntropy Regularization · Proximal Policy Optimization · Heatmap