Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation   Models

Nils Blank; Moritz Reuss; Marcel R\"uhle; \"Omer Erdin\c{c}; Ya\u{g}murlu; Fabian Wenzel; Oier Mees; Rudolf Lioutikov

arXiv:2410.17772·cs.RO·October 29, 2024

Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models

Nils Blank, Moritz Reuss, Marcel R\"uhle, \"Omer Erdin\c{c}, Ya\u{g}murlu, Fabian Wenzel, Oier Mees, Rudolf Lioutikov

PDF

Open Access

TL;DR

This paper introduces NILS, a zero-shot, automatic labeling method using foundation models to scale robot policy learning by annotating large, unstructured datasets without human input.

Contribution

NILS is a novel approach that leverages pretrained vision-language models to automatically label robot data at scale, improving diversity and quality over traditional human annotations.

Findings

01

NILS successfully labeled over 115,000 robot trajectories.

02

It improved annotation diversity and quality compared to crowdsourced labels.

03

The method enables scalable, zero-shot annotation of unstructured robot datasets.

Abstract

A central challenge towards developing robots that can relate human language to their perception and actions is the scarcity of natural language annotations in diverse robot datasets. Moreover, robot policies that follow natural language instructions are typically trained on either templated language or expensive human-labeled instructions, hindering their scalability. To this end, we introduce NILS: Natural language Instruction Labeling for Scalability. NILS automatically labels uncurated, long-horizon robot data at scale in a zero-shot manner without any human intervention. NILS combines pretrained vision-language foundation models in order to detect objects in a scene, detect object-centric changes, segment tasks from large datasets of unlabelled interaction data and ultimately label behavior datasets. Evaluations on BridgeV2, Fractal, and a kitchen play dataset show that NILS can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Reinforcement Learning in Robotics