Optimizing Neurorobot Policy under Limited Demonstration Data through Preference Regret

Viet Dung Nguyen; Yuhang Song; Anh Nguyen; Jamison Heard; Reynold Bailey; Alexander Ororbia

arXiv:2604.03523·cs.RO·April 7, 2026

Optimizing Neurorobot Policy under Limited Demonstration Data through Preference Regret

Viet Dung Nguyen, Yuhang Song, Anh Nguyen, Jamison Heard, Reynold Bailey, Alexander Ororbia

PDF

1 Repo

TL;DR

This paper introduces MYOE, a self-imitation framework for robot reinforcement learning from limited demonstrations, utilizing a preference-based goal estimation to improve robustness and out-of-sample performance.

Contribution

The paper proposes the QMoP-SSM model and preference regret optimization, enabling robots to learn complex behaviors from scarce demonstration data.

Findings

01

Demonstrates robustness and adaptability of the proposed method

02

Outperforms state-of-the-art RLfD schemes in experiments

03

Effective in limited demonstration data scenarios

Abstract

Robot reinforcement learning from demonstrations (RLfD) assumes that expert data is abundant; this is usually unrealistic in the real world given data scarcity as well as high collection cost. Furthermore, imitation learning algorithms assume that the data is independently and identically distributed, which ultimately results in poorer performance as gradual errors emerge and compound within test-time trajectories. We address these issues by introducing the "master your own expertise" (MYOE) framework, a self-imitation framework that enables robotic agents to learn complex behaviors from limited demonstration data samples. Inspired by human perception and action, we propose and design what we call the queryable mixture-of-preferences state space model (QMoP-SSM), which estimates the desired goal at every time step. These desired goals are used in computing the "preference regret", which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rxng8/neurorobot-preference-regret-learning
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.