Large Batch Experience Replay

Thibault Lahire; Matthieu Geist; Emmanuel Rachelson

arXiv:2110.01528·cs.LG·June 15, 2022

Large Batch Experience Replay

Thibault Lahire, Matthieu Geist, Emmanuel Rachelson

PDF

Open Access 2 Repos

TL;DR

This paper develops a theoretical foundation for replay buffer sampling in deep RL, introduces LaBER as an efficient approximation, and demonstrates improved performance across various environments.

Contribution

It derives the optimal sampling distribution for replay buffers, provides theoretical insights into Prioritized Experience Replay, and proposes LaBER, a practical and effective sampling method.

Findings

01

LaBER improves performance over baseline agents.

02

Theoretical derivation of optimal sampling distribution.

03

Effective in Atari and PyBullet environments.

Abstract

Several algorithms have been proposed to sample non-uniformly the replay buffer of deep Reinforcement Learning (RL) agents to speed-up learning, but very few theoretical foundations of these sampling schemes have been provided. Among others, Prioritized Experience Replay appears as a hyperparameter sensitive heuristic, even though it can provide good performance. In this work, we cast the replay buffer sampling problem as an importance sampling one for estimating the gradient. This allows deriving the theoretically optimal sampling distribution, yielding the best theoretical convergence speed. Elaborating on the knowledge of the ideal sampling scheme, we exhibit new theoretical foundations of Prioritized Experience Replay. The optimal sampling distribution being intractable, we make several approximations providing good results in practice and introduce, among others, LaBER (Large Batch…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning

MethodsExperience Replay · Prioritized Experience Replay