OHP-RL: Online Human Preference as Guidance in Reinforcement Learning for Robot Manipulation
Yunyang Mo (1), Jian Li (1), Qiwei Wu (1), Yihang Kang (1), Renjing Xu (1) ((1) The Hong Kong University of Science, Technology (Guangzhou))

TL;DR
OHP-RL is a novel reinforcement learning framework that uses human preferences as guidance, enabling safer, more efficient robot manipulation with less human effort and improved stability.
Contribution
The paper introduces a state-dependent preference gate in RL that effectively incorporates intermittent human feedback for robot manipulation tasks.
Findings
Achieves higher success rates across tasks
Converges faster than prior methods
Requires less human intervention
Abstract
While reinforcement learning (RL) enables robots to acquire skills autonomously, its real-world deployment is severely limited by inefficient and unsafe exploration. Human-in-the-loop interventions offer a practical solution, yet existing methods typically exploit these interventions as auxiliary training signals, without fully capturing the richer information they provide about when and how autonomy should be guided. Human interventions often encode relative preferences over behavior under safety and task constraints, rather than prescribing exact actions to imitate. Motivated by this perspective, we propose Online Human Preference as Guidance in Reinforcement Learning (OHP-RL), a framework that leverages human interventions as preference information to guide policy learning. OHP-RL introduces a state-dependent preference gate that adaptively regulates when and to what extent human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
