TL;DR
This paper introduces MYOE, a self-imitation framework for robot reinforcement learning from limited demonstrations, utilizing a preference-based goal estimation to improve robustness and out-of-sample performance.
Contribution
The paper proposes the QMoP-SSM model and preference regret optimization, enabling robots to learn complex behaviors from scarce demonstration data.
Findings
Demonstrates robustness and adaptability of the proposed method
Outperforms state-of-the-art RLfD schemes in experiments
Effective in limited demonstration data scenarios
Abstract
Robot reinforcement learning from demonstrations (RLfD) assumes that expert data is abundant; this is usually unrealistic in the real world given data scarcity as well as high collection cost. Furthermore, imitation learning algorithms assume that the data is independently and identically distributed, which ultimately results in poorer performance as gradual errors emerge and compound within test-time trajectories. We address these issues by introducing the "master your own expertise" (MYOE) framework, a self-imitation framework that enables robotic agents to learn complex behaviors from limited demonstration data samples. Inspired by human perception and action, we propose and design what we call the queryable mixture-of-preferences state space model (QMoP-SSM), which estimates the desired goal at every time step. These desired goals are used in computing the "preference regret", which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
