PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling
Utsav Singh, Wesley A. Suttle, Brian M. Sadler, Vinay P. Namboodiri,, Amrit Singh Bedi

TL;DR
PIPER introduces a primitive-informed, preference-based hierarchical reinforcement learning method that uses hindsight relabeling and regularization to improve success rates in sparse-reward robotic tasks.
Contribution
The paper presents PIPER, a novel hierarchical RL approach that replaces human feedback with environment-generated rewards and mitigates non-stationarity through relabeling and primitive-informed regularization.
Findings
Achieves over 50% success rates in challenging sparse-reward tasks.
Effectively mitigates non-stationarity in hierarchical reinforcement learning.
Outperforms most baselines in robotic environments.
Abstract
In this work, we introduce PIPER: Primitive-Informed Preference-based Hierarchical reinforcement learning via Hindsight Relabeling, a novel approach that leverages preference-based learning to learn a reward model, and subsequently uses this reward model to relabel higher-level replay buffers. Since this reward is unaffected by lower primitive behavior, our relabeling-based approach is able to mitigate non-stationarity, which is common in existing hierarchical approaches, and demonstrates impressive performance across a range of challenging sparse-reward tasks. Since obtaining human feedback is typically impractical, we propose to replace the human-in-the-loop approach with our primitive-in-the-loop approach, which generates feedback using sparse rewards provided by the environment. Moreover, in order to prevent infeasible subgoal prediction and avoid degenerate solutions, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition
