DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning
Utsav Singh, Souradip Chakraborty, Wesley A. Suttle, Brian M. Sadler,, Vinay P Namboodiri, Amrit Singh Bedi

TL;DR
DIPPER introduces a hierarchical reinforcement learning method that uses direct preference optimization and primitive-informed regularization to efficiently learn complex robotics tasks from limited human preferences, outperforming existing methods.
Contribution
The paper presents DIPPER, a novel hierarchical RL approach combining direct preference optimization with primitive-informed regularization, addressing efficiency and subgoal generation issues.
Findings
DIPPER outperforms hierarchical and non-hierarchical baselines.
It improves computational efficiency over standard preference-based methods.
It mitigates non-stationarity and infeasible subgoal issues in hierarchical RL.
Abstract
Learning control policies to perform complex robotics tasks from human preference data presents significant challenges. On the one hand, the complexity of such tasks typically requires learning policies to perform a variety of subtasks, then combining them to achieve the overall goal. At the same time, comprehensive, well-engineered reward functions are typically unavailable in such problems, while limited human preference data often is; making efficient use of such data to guide learning is therefore essential. Methods for learning to perform complex robotics tasks from human preference data must overcome both these challenges simultaneously. In this work, we introduce DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning, an efficient hierarchical approach that leverages direct preference optimization to learn a higher-level policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Parking Systems Research
