Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning
Jinyi Liu, Yi Ma, Jianye Hao, Yujing Hu, Yan Zheng, Tangjie Lv,, Changjie Fan

TL;DR
This paper introduces a trajectory-based replay memory technique, Prioritized Trajectory Replay, that improves data sampling efficiency and performance in offline reinforcement learning by leveraging trajectory information and prioritized sampling.
Contribution
It proposes a novel trajectory sampling method, extending replay memory to trajectories, and introduces prioritized sampling to enhance offline RL performance.
Findings
Improved offline RL performance on D4RL benchmarks.
Enhanced data efficiency through backward trajectory sampling.
Effective avoidance of unseen actions during training.
Abstract
In recent years, data-driven reinforcement learning (RL), also known as offline RL, have gained significant attention. However, the role of data sampling techniques in offline RL has been overlooked despite its potential to enhance online RL performance. Recent research suggests applying sampling techniques directly to state-transitions does not consistently improve performance in offline RL. Therefore, in this study, we propose a memory technique, (Prioritized) Trajectory Replay (TR/PTR), which extends the sampling perspective to trajectories for more comprehensive information extraction from limited data. TR enhances learning efficiency by backward sampling of trajectories that optimizes the use of subsequent state information. Building on TR, we build the weighted critic target to avoid sampling unseen actions in offline training, and Prioritized Trajectory Replay (PTR) that enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
