PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement   Learning via Hindsight Relabeling

Utsav Singh; Wesley A. Suttle; Brian M. Sadler; Vinay P. Namboodiri,; Amrit Singh Bedi

arXiv:2404.13423·cs.LG·June 18, 2024

PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling

Utsav Singh, Wesley A. Suttle, Brian M. Sadler, Vinay P. Namboodiri,, Amrit Singh Bedi

PDF

Open Access 1 Repo

TL;DR

PIPER introduces a primitive-informed, preference-based hierarchical reinforcement learning method that uses hindsight relabeling and regularization to improve success rates in sparse-reward robotic tasks.

Contribution

The paper presents PIPER, a novel hierarchical RL approach that replaces human feedback with environment-generated rewards and mitigates non-stationarity through relabeling and primitive-informed regularization.

Findings

01

Achieves over 50% success rates in challenging sparse-reward tasks.

02

Effectively mitigates non-stationarity in hierarchical reinforcement learning.

03

Outperforms most baselines in robotic environments.

Abstract

In this work, we introduce PIPER: Primitive-Informed Preference-based Hierarchical reinforcement learning via Hindsight Relabeling, a novel approach that leverages preference-based learning to learn a reward model, and subsequently uses this reward model to relabel higher-level replay buffers. Since this reward is unaffected by lower primitive behavior, our relabeling-based approach is able to mitigate non-stationarity, which is common in existing hierarchical approaches, and demonstrates impressive performance across a range of challenging sparse-reward tasks. Since obtaining human feedback is typically impractical, we propose to replace the human-in-the-loop approach with our primitive-in-the-loop approach, which generates feedback using sparse rewards provided by the environment. Moreover, in order to prevent infeasible subgoal prediction and avoid degenerate solutions, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

utsavz/piper
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition