Enabling Option Learning in Sparse Rewards with Hindsight Experience Replay
Gabriel Romio, Mateus Begnini Melchiades, Bruno Castro da Silva, Gabriel de Oliveira Ramos

TL;DR
This paper introduces MOC-2HER, a novel method combining hierarchical reinforcement learning with dual goal relabeling, significantly improving success rates in sparse reward robotic manipulation tasks.
Contribution
It proposes a dual objectives HER extension that enhances hierarchical RL in multi-goal sparse reward environments, especially for object manipulation.
Findings
MOC-2HER achieves up to 90% success rate in robotic tasks.
Standard MOC and MOC-HER achieve less than 11% success.
Dual goal relabeling improves learning efficiency in sparse rewards.
Abstract
Hierarchical Reinforcement Learning (HRL) frameworks like Option-Critic (OC) and Multi-updates Option Critic (MOC) have introduced significant advancements in learning reusable options. However, these methods underperform in multi-goal environments with sparse rewards, where actions must be linked to temporally distant outcomes. To address this limitation, we first propose MOC-HER, which integrates the Hindsight Experience Replay (HER) mechanism into the MOC framework. By relabeling goals from achieved outcomes, MOC-HER can solve sparse reward environments that are intractable for the original MOC. However, this approach is insufficient for object manipulation tasks, where the reward depends on the object reaching the goal rather than on the agent's direct interaction. This makes it extremely difficult for HRL agents to discover how to interact with these objects. To overcome this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Social Robot Interaction and HRI · Domain Adaptation and Few-Shot Learning
