Sample Efficiency in Sparse Reinforcement Learning: Or Your Money Back
Trevor A. McInroe

TL;DR
This paper introduces OYMB, a replay memory sampler that enhances sparse reinforcement learning with HER by controlling minibatch composition, leading to faster and more effective learning of real goals.
Contribution
The paper proposes OYMB, a novel replay memory sampling method that improves HER's efficiency by prioritizing real-goal memories, with demonstrated performance gains.
Findings
OYMB outperforms standard HER in multiple tasks.
Agents learn to achieve real goals more quickly with OYMB.
The method improves training efficiency in sparse reward settings.
Abstract
Sparse rewards present a difficult problem in reinforcement learning and may be inevitable in certain domains with complex dynamics such as real-world robotics. Hindsight Experience Replay (HER) is a recent replay memory development that allows agents to learn in sparse settings by altering memories to show them as successful even though they may not be. While, empirically, HER has shown some success, it does not provide guarantees around the makeup of samples drawn from an agent's replay memory. This may result in minibatches that contain only memories with zero-valued rewards or agents learning an undesirable policy that completes HER-adjusted goals instead of the actual goal. In this paper, we introduce Or Your Money Back (OYMB), a replay memory sampler designed to work with HER. OYMB improves training efficiency in sparse settings by providing a direct interface to the agent's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Explainable Artificial Intelligence (XAI)
MethodsExperience Replay
