Preference-Guided Reinforcement Learning for Efficient Exploration

Guojian Wang; Jianxiang Liu; Xinyuan Li; Faguo Wu; Xiao Zhang; Tianyuan Chen; Xuyang Chen

arXiv:2407.06503·cs.LG·November 11, 2025

Preference-Guided Reinforcement Learning for Efficient Exploration

Guojian Wang, Jianxiang Liu, Xinyuan Li, Faguo Wu, Xiao Zhang, Tianyuan Chen, Xuyang Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces LOPE, a preference-guided reinforcement learning framework that improves exploration efficiency in challenging tasks by leveraging human feedback directly, without learning a separate reward model.

Contribution

LOPE is a novel end-to-end RL framework that uses trajectory preference guidance to enhance exploration in hard tasks, avoiding the need for reward modeling.

Findings

01

LOPE outperforms state-of-the-art methods in challenging environments.

02

LOPE achieves faster convergence and better overall performance.

03

Theoretical analysis bounds performance improvements.

Abstract

In this paper, we investigate preference-based reinforcement learning (PbRL), which enables reinforcement learning (RL) agents to learn from human feedback. This is particularly valuable when defining a fine-grain reward function is not feasible. However, this approach is inefficient and impractical for promoting deep exploration in hard-exploration tasks with long horizons and sparse rewards. To tackle this issue, we introduce LOPE: \textbf{L}earning \textbf{O}nline with trajectory \textbf{P}reference guidanc\textbf{E}, an end-to-end preference-guided RL framework that enhances exploration efficiency in hard-exploration tasks. Our intuition is that LOPE directly adjusts the focus of online exploration by considering human feedback as guidance, thereby avoiding the need to learn a separate reward model from preferences. Specifically, LOPE includes a two-step sequential policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

buaawgj/lope
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Path Planning Algorithms · Reinforcement Learning in Robotics · Data Stream Mining Techniques

MethodsFocus