OHP-RL: Online Human Preference as Guidance in Reinforcement Learning for Robot Manipulation

Yunyang Mo (1); Jian Li (1); Qiwei Wu (1); Yihang Kang (1); Renjing Xu (1) ((1) The Hong Kong University of Science; Technology (Guangzhou))

arXiv:2605.15971·cs.RO·May 18, 2026

OHP-RL: Online Human Preference as Guidance in Reinforcement Learning for Robot Manipulation

Yunyang Mo (1), Jian Li (1), Qiwei Wu (1), Yihang Kang (1), Renjing Xu (1) ((1) The Hong Kong University of Science, Technology (Guangzhou))

PDF

TL;DR

OHP-RL is a novel reinforcement learning framework that uses human preferences as guidance, enabling safer, more efficient robot manipulation with less human effort and improved stability.

Contribution

The paper introduces a state-dependent preference gate in RL that effectively incorporates intermittent human feedback for robot manipulation tasks.

Findings

01

Achieves higher success rates across tasks

02

Converges faster than prior methods

03

Requires less human intervention

Abstract

While reinforcement learning (RL) enables robots to acquire skills autonomously, its real-world deployment is severely limited by inefficient and unsafe exploration. Human-in-the-loop interventions offer a practical solution, yet existing methods typically exploit these interventions as auxiliary training signals, without fully capturing the richer information they provide about when and how autonomy should be guided. Human interventions often encode relative preferences over behavior under safety and task constraints, rather than prescribing exact actions to imitate. Motivated by this perspective, we propose Online Human Preference as Guidance in Reinforcement Learning (OHP-RL), a framework that leverages human interventions as preference information to guide policy learning. OHP-RL introduces a state-dependent preference gate that adaptively regulates when and to what extent human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.