Learning Latent Plans from Play
Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson,, Sergey Levine, Pierre Sermanet

TL;DR
This paper introduces a self-supervised approach called Play-LMP that learns to organize and reuse behaviors from large-scale human teleoperated play data, enabling robots to perform diverse manipulation tasks more robustly and efficiently.
Contribution
The work presents a novel self-supervised method that leverages unlabeled play data to learn a latent space of behaviors, improving generalization and robustness in robotic skill learning.
Findings
Outperforms expert policies on 18 manipulation tasks
Models are more robust to perturbations and retry behaviors
Latent space organizes around functional tasks without labels
Abstract
Acquiring a diverse repertoire of general-purpose skills remains an open challenge for robotics. In this work, we propose self-supervising control on top of human teleoperated play data as a way to scale up skill learning. Play has two properties that make it attractive compared to conventional task demonstrations. Play is cheap, as it can be collected in large quantities quickly without task segmenting, labeling, or resetting to an initial state. Play is naturally rich, covering ~4x more interaction space than task demonstrations for the same amount of collection time. To learn control from play, we introduce Play-LMP, a self-supervised method that learns to organize play behaviors in a latent space, then reuse them at test time to achieve specific goals. Combining self-supervised control with a diverse play dataset shifts the focus of skill learning from a narrow and discrete set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Social Robot Interaction and HRI
