Formulating Reinforcement Learning for Human-Robot Collaboration through Off-Policy Evaluation

Saurav Singh; Rodney Sanchez; Alexander Ororbia; Jamison Heard

arXiv:2602.02530·cs.LG·February 4, 2026

Formulating Reinforcement Learning for Human-Robot Collaboration through Off-Policy Evaluation

Saurav Singh, Rodney Sanchez, Alexander Ororbia, Jamison Heard

PDF

Open Access

TL;DR

This paper introduces a novel offline reinforcement learning framework that uses off-policy evaluation to select optimal state representations and reward functions for human-robot collaboration, reducing reliance on costly real-time interactions.

Contribution

It proposes an OPE-based method for automatic selection of state spaces and reward functions in RL, validated on simulated and real-world human-robot interaction environments.

Findings

01

Effective in selecting high-performing policies using logged data

02

Reduces need for environment interaction during RL setup

03

Applicable to complex human-robot collaboration scenarios

Abstract

Reinforcement learning (RL) has the potential to transform real-world decision-making systems by enabling autonomous agents to learn from experience. Deploying RL in real-world settings, especially in the context of human-robot interaction, requires defining state representations and reward functions, which are critical for learning efficiency and policy performance. Traditional RL approaches often rely on domain expertise and trial-and-error, necessitating extensive human involvement as well as direct interaction with the environment, which can be costly and impractical, especially in complex and safety-critical applications. This work proposes a novel RL framework that leverages off-policy evaluation (OPE) for state space and reward function selection, using only logged interaction data. This approach eliminates the need for real-time access to the environment or human-in-the-loop…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Human-Automation Interaction and Safety · Social Robot Interaction and HRI