Preference-Guided Reinforcement Learning for Efficient Exploration
Guojian Wang, Jianxiang Liu, Xinyuan Li, Faguo Wu, Xiao Zhang, Tianyuan Chen, Xuyang Chen

TL;DR
This paper introduces LOPE, a preference-guided reinforcement learning framework that improves exploration efficiency in challenging tasks by leveraging human feedback directly, without learning a separate reward model.
Contribution
LOPE is a novel end-to-end RL framework that uses trajectory preference guidance to enhance exploration in hard tasks, avoiding the need for reward modeling.
Findings
LOPE outperforms state-of-the-art methods in challenging environments.
LOPE achieves faster convergence and better overall performance.
Theoretical analysis bounds performance improvements.
Abstract
In this paper, we investigate preference-based reinforcement learning (PbRL), which enables reinforcement learning (RL) agents to learn from human feedback. This is particularly valuable when defining a fine-grain reward function is not feasible. However, this approach is inefficient and impractical for promoting deep exploration in hard-exploration tasks with long horizons and sparse rewards. To tackle this issue, we introduce LOPE: \textbf{L}earning \textbf{O}nline with trajectory \textbf{P}reference guidanc\textbf{E}, an end-to-end preference-guided RL framework that enhances exploration efficiency in hard-exploration tasks. Our intuition is that LOPE directly adjusts the focus of online exploration by considering human feedback as guidance, thereby avoiding the need to learn a separate reward model from preferences. Specifically, LOPE includes a two-step sequential policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms · Reinforcement Learning in Robotics · Data Stream Mining Techniques
MethodsFocus
