Sample Efficiency in Sparse Reinforcement Learning: Or Your Money Back

Trevor A. McInroe

arXiv:2008.12693·cs.LG·August 31, 2020

Sample Efficiency in Sparse Reinforcement Learning: Or Your Money Back

Trevor A. McInroe

PDF

Open Access

TL;DR

This paper introduces OYMB, a replay memory sampler that enhances sparse reinforcement learning with HER by controlling minibatch composition, leading to faster and more effective learning of real goals.

Contribution

The paper proposes OYMB, a novel replay memory sampling method that improves HER's efficiency by prioritizing real-goal memories, with demonstrated performance gains.

Findings

01

OYMB outperforms standard HER in multiple tasks.

02

Agents learn to achieve real goals more quickly with OYMB.

03

The method improves training efficiency in sparse reward settings.

Abstract

Sparse rewards present a difficult problem in reinforcement learning and may be inevitable in certain domains with complex dynamics such as real-world robotics. Hindsight Experience Replay (HER) is a recent replay memory development that allows agents to learn in sparse settings by altering memories to show them as successful even though they may not be. While, empirically, HER has shown some success, it does not provide guarantees around the makeup of samples drawn from an agent's replay memory. This may result in minibatches that contain only memories with zero-valued rewards or agents learning an undesirable policy that completes HER-adjusted goals instead of the actual goal. In this paper, we introduce Or Your Money Back (OYMB), a replay memory sampler designed to work with HER. OYMB improves training efficiency in sparse settings by providing a direct interface to the agent's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Explainable Artificial Intelligence (XAI)

MethodsExperience Replay