Prioritized Sequence Experience Replay

Marc Brittain; Josh Bertram; Xuxi Yang; Peng Wei

arXiv:1905.12726·cs.LG·February 20, 2020·33 cites

Prioritized Sequence Experience Replay

Marc Brittain, Josh Bertram, Xuxi Yang, Peng Wei

PDF

Open Access

TL;DR

This paper introduces Prioritized Sequence Experience Replay (PSER), a new method that prioritizes sequences of experiences to improve learning efficiency and performance in reinforcement learning, outperforming existing prioritized experience replay methods.

Contribution

The paper proposes PSER, a novel framework for sequence prioritization in experience replay, with theoretical convergence guarantees and empirical performance improvements over PER.

Findings

01

PSER converges faster than PER in theory.

02

Empirically, PSER outperforms PER on Atari benchmarks.

03

PSER enhances learning efficiency in deep reinforcement learning.

Abstract

Experience replay is widely used in deep reinforcement learning algorithms and allows agents to remember and learn from experiences from the past. In an effort to learn more efficiently, researchers proposed prioritized experience replay (PER) which samples important transitions more frequently. In this paper, we propose Prioritized Sequence Experience Replay (PSER) a framework for prioritizing sequences of experience in an attempt to both learn more efficiently and to obtain better performance. We compare the performance of PER and PSER sampling techniques in a tabular Q-learning environment and in DQN on the Atari 2600 benchmark. We prove theoretically that PSER is guaranteed to converge faster than PER and empirically show PSER substantially improves upon PER.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Age of Information Optimization

MethodsPrioritized Experience Replay · Experience Replay · Dense Connections · Convolution · Q-Learning · Deep Q-Network