Learning to Discern: Imitating Heterogeneous Human Demonstrations with   Preference and Representation Learning

Sachit Kuhar; Shuo Cheng; Shivang Chopra; Matthew Bronars and; Danfei Xu

arXiv:2310.14196·cs.RO·May 7, 2025·2 cites

Learning to Discern: Imitating Heterogeneous Human Demonstrations with Preference and Representation Learning

Sachit Kuhar, Shuo Cheng, Shivang Chopra, Matthew Bronars and, Danfei Xu

PDF

Open Access

TL;DR

This paper introduces Learning to Discern (L2D), an offline imitation learning framework that uses preference and representation learning to evaluate and learn from heterogeneous human demonstrations of varying quality, improving policy performance.

Contribution

L2D is a novel framework that learns a latent representation and quality evaluator from limited labeled demonstrations, handling diverse styles and suboptimal data in imitation learning.

Findings

01

L2D effectively assesses demonstration quality across diverse styles.

02

L2D improves policy performance in simulation and real robot tasks.

03

The approach generalizes well to new demonstrators and styles.

Abstract

Practical Imitation Learning (IL) systems rely on large human demonstration datasets for successful policy learning. However, challenges lie in maintaining the quality of collected data and addressing the suboptimal nature of some demonstrations, which can compromise the overall dataset quality and hence the learning outcome. Furthermore, the intrinsic heterogeneity in human behavior can produce equally successful but disparate demonstrations, further exacerbating the challenge of discerning demonstration quality. To address these challenges, this paper introduces Learning to Discern (L2D), an offline imitation learning framework for learning from demonstrations with diverse quality and style. Given a small batch of demonstrations with sparse quality labels, we learn a latent representation for temporally embedded trajectory segments. Preference learning in this latent space trains a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning