When Does Non-Uniform Replay Matter in Reinforcement Learning?
Michal Korniak, Miko{\l}aj Czarnecki, Yarden As, Piotr Mi{\l}o\'s, Pieter Abbeel, Michal Nauman

TL;DR
This paper clarifies when non-uniform replay sampling benefits off-policy reinforcement learning, emphasizing the roles of replay volume, recency, and sampling entropy, and introduces a simple, effective replay strategy.
Contribution
It provides practical guidelines for replay design, identifying conditions where non-uniform replay improves efficiency and proposing a Truncated Geometric sampling method.
Findings
Non-uniform replay benefits are most significant at low replay volumes.
High-entropy sampling improves performance even with recent transitions.
The proposed Truncated Geometric replay enhances sample efficiency across various settings.
Abstract
Modern off-policy reinforcement learning algorithms often rely on simple uniform replay sampling and it remains unclear when and why non-uniform replay improves over this strong baseline. Across diverse RL settings, we show that the effectiveness of non-uniform replay is governed by three factors: replay volume, the number of replayed transitions per environment step; expected recency, how recent sampled transitions are; and the entropy of the replay sampling distribution. Our main contribution is clarifying when non-uniform replay is beneficial and providing practical guidance for replay design in modern off-policy RL. Namely, we find that non-uniform replay is most beneficial when replay volume is low, and that high-entropy sampling is important even at comparable expected recency. Motivated by these findings, we adopt a simple Truncated Geometric replay that biases sampling toward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
