Confidence-Aware Imitation Learning from Demonstrations with Varying   Optimality

Songyuan Zhang; Zhangjie Cao; Dorsa Sadigh; Yanan Sui

arXiv:2110.14754·cs.LG·January 27, 2022·20 cites

Confidence-Aware Imitation Learning from Demonstrations with Varying Optimality

Songyuan Zhang, Zhangjie Cao, Dorsa Sadigh, Yanan Sui

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces Confidence-Aware Imitation Learning (CAIL), a framework that learns effective policies from demonstrations with varying optimality by jointly estimating confidence scores and policy performance, outperforming existing methods.

Contribution

The paper proposes a novel CAIL framework that jointly learns confidence scores and policies from non-optimal demonstrations, with theoretical guarantees and superior empirical results.

Findings

01

CAIL outperforms existing imitation learning methods in simulated and real robot experiments.

02

CAIL can learn successful policies even without access to optimal demonstrations.

03

Theoretical guarantees ensure convergence of the proposed framework.

Abstract

Most existing imitation learning approaches assume the demonstrations are drawn from experts who are optimal, but relaxing this assumption enables us to use a wider range of data. Standard imitation learning may learn a suboptimal policy from demonstrations with varying optimality. Prior works use confidence scores or rankings to capture beneficial information from demonstrations with varying optimality, but they suffer from many limitations, e.g., manually annotated confidence scores or high average optimality of demonstrations. In this paper, we propose a general framework to learn from demonstrations with varying optimality that jointly learns the confidence score and a well-performing policy. Our approach, Confidence-Aware Imitation Learning (CAIL) learns a well-performing policy from confidence-reweighted demonstrations, while using an outer loss to track the performance of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Confidence-Aware Imitation Learning from Demonstrations with Varying Optimality· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Machine Learning and Algorithms