SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning

Hexian Ni; Tao Lu; Haoyuan Hu; Yinghao Cai; Shuo Wang

arXiv:2506.14648·cs.RO·May 22, 2026

SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning

Hexian Ni, Tao Lu, Haoyuan Hu, Yinghao Cai, Shuo Wang

PDF

1 Repo

TL;DR

SENIOR introduces an efficient query selection and exploration method in preference-based reinforcement learning, significantly improving human feedback efficiency and policy learning speed in complex robot tasks.

Contribution

The paper proposes a novel Motion-Distinction-based Selection scheme and preference-guided exploration method to enhance feedback efficiency and accelerate policy learning in PbRL.

Findings

01

Outperforms five existing methods in feedback efficiency and convergence speed.

02

Effective in both simulated and real-world robot manipulation tasks.

03

Videos demonstrating results are available online.

Abstract

Preference-based Reinforcement Learning (PbRL) methods provide a solution to avoid reward engineering by learning reward models based on human preferences. However, poor feedback- and sample- efficiency still remain the problems that hinder the application of PbRL. In this paper, we present a novel efficient query selection and preference-guided exploration method, called SENIOR, which could select the meaningful and easy-to-comparison behavior segment pairs to improve human feedback-efficiency and accelerate policy learning with the designed preference-guided intrinsic rewards. Our key idea is twofold: (1) We designed a Motion-Distinction-based Selection scheme (MDS). It selects segment pairs with apparent motion and different directions through kernel density estimation of states, which is more task-related and easy for human preference labeling; (2) We proposed a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://2025senior.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Data Management and Algorithms · Recommender Systems and Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings