Learning from Suboptimal Demonstration via Self-Supervised Reward   Regression

Letian Chen; Rohan Paleja; Matthew Gombolay

arXiv:2010.11723·cs.RO·November 24, 2020·31 cites

Learning from Suboptimal Demonstration via Self-Supervised Reward Regression

Letian Chen, Rohan Paleja, Matthew Gombolay

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel self-supervised reward regression method that effectively learns from suboptimal demonstrations, significantly improving reward estimation and policy performance in robotic tasks.

Contribution

The authors develop a new approach that bootstraps suboptimal demonstrations to synthesize optimality-parameterized data for training reward functions, overcoming limitations of previous ranking-based methods.

Findings

01

Achieves ~0.95 correlation with ground-truth reward

02

Policy improvements of ~200% over suboptimal demos

03

Physical robot demonstration with faster, more topspin shots

Abstract

Learning from Demonstration (LfD) seeks to democratize robotics by enabling non-roboticist end-users to teach robots to perform a task by providing a human demonstration. However, modern LfD techniques, e.g. inverse reinforcement learning (IRL), assume users provide at least stochastically optimal demonstrations. This assumption fails to hold in most real-world scenarios. Recent attempts to learn from sub-optimal demonstration leverage pairwise rankings and following the Luce-Shepard rule. However, we show these approaches make incorrect assumptions and thus suffer from brittle, degraded performance. We overcome these limitations in developing a novel approach that bootstraps off suboptimal demonstrations to synthesize optimality-parameterized data to train an idealized reward function. We empirically validate we learn an idealized reward function with ~0.95 correlation with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CORE-Robotics-Lab/SSRR
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Machine Learning and Data Classification