Understanding the effect of varying amounts of replay per step
Animesh Kumar Paul, Videh Raj Nema

TL;DR
This paper systematically investigates how varying the amount of experience replay per step affects the performance of Deep Q-Networks, revealing that more replay enhances sample efficiency, stability, and robustness.
Contribution
It provides the first systematic study on the impact of replay frequency in DQN, linking experience replay to planning benefits in model-free reinforcement learning.
Findings
Increasing replay improves sample efficiency.
More replay reduces performance variability.
Enhanced replay makes DQN more robust to hyperparameter changes.
Abstract
Model-based reinforcement learning uses models to plan, where the predictions and policies of an agent can be improved by using more computation without additional data from the environment, thereby improving sample efficiency. However, learning accurate estimates of the model is hard. Subsequently, the natural question is whether we can get similar benefits as planning with model-free methods. Experience replay is an essential component of many model-free algorithms enabling sample-efficient learning and stability by providing a mechanism to store past experiences for further reuse in the gradient computational process. Prior works have established connections between models and experience replay by planning with the latter. This involves increasing the number of times a mini-batch is sampled and used for updates at each step (amount of replay per step). We attempt to exploit this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Mobile Crowdsensing and Crowdsourcing
MethodsExperience Replay
