PTR-PPO: Proximal Policy Optimization with Prioritized Trajectory Replay
Xingxing Liang, Yang Ma, Yanghe Feng, Zhong Liu

TL;DR
PTR-PPO enhances on-policy reinforcement learning by integrating prioritized trajectory replay, combining on-policy and off-policy methods to improve data efficiency and achieve state-of-the-art results in Atari tasks.
Contribution
This paper introduces PTR-PPO, a novel algorithm that incorporates prioritized trajectory replay into PPO, with new priority measures and variance reduction techniques.
Findings
Achieves state-of-the-art performance on Atari tasks.
Memory size and rollout length significantly affect priority distribution.
Prioritized replay improves sampling efficiency in reinforcement learning.
Abstract
On-policy deep reinforcement learning algorithms have low data utilization and require significant experience for policy improvement. This paper proposes a proximal policy optimization algorithm with prioritized trajectory replay (PTR-PPO) that combines on-policy and off-policy methods to improve sampling efficiency by prioritizing the replay of trajectories generated by old policies. We first design three trajectory priorities based on the characteristics of trajectories: the first two being max and mean trajectory priorities based on one-step empirical generalized advantage estimation (GAE) values and the last being reward trajectory priorities based on normalized undiscounted cumulative reward. Then, we incorporate the prioritized trajectory replay into the PPO algorithm, propose a truncated importance weight method to overcome the high variance caused by large importance weights…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsEntropy Regularization · Proximal Policy Optimization · Heatmap
