PLEX: Making the Most of the Available Data for Robotic Manipulation   Pretraining

Garrett Thomas; Ching-An Cheng; Ricky Loynd; Felipe Vieira Frujeri,; Vibhav Vineet; Mihai Jalobeanu; Andrey Kolobov

arXiv:2303.08789·cs.RO·November 10, 2023·1 cites

PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining

Garrett Thomas, Ching-An Cheng, Ricky Loynd, Felipe Vieira Frujeri,, Vibhav Vineet, Mihai Jalobeanu, Andrey Kolobov

PDF

Open Access

TL;DR

PLEX is a transformer-based model that efficiently learns robotic manipulation skills from limited task-agnostic data and abundant task-specific videos, enabling strong generalization and planning capabilities.

Contribution

The paper introduces PLEX, a novel architecture that combines small amounts of visuomotor trajectories with large-scale video data for effective robotic manipulation learning.

Findings

01

PLEX achieves state-of-the-art performance in Robosuite environments.

02

Relative positional encoding improves learning in low-data regimes.

03

PLEX generalizes well to unseen tasks in Meta-World.

Abstract

A rich representation is key to general robotic manipulation, but existing approaches to representation learning require large amounts of multimodal demonstrations. In this work we propose PLEX, a transformer-based architecture that learns from a small amount of task-agnostic visuomotor trajectories and a much larger amount of task-conditioned object manipulation videos -- a type of data available in quantity. PLEX uses visuomotor trajectories to induce a latent feature space and to learn task-agnostic manipulation routines, while diverse video-only demonstrations teach PLEX how to plan in the induced latent feature space for a wide variety of tasks. Experiments showcase PLEX's generalization on Meta-World and SOTA performance in challenging Robosuite environments. In particular, using relative positional encoding in PLEX's transformers greatly helps in low-data regimes of learning from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Multimodal Machine Learning Applications