RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation

Mel Vecerik; Carl Doersch; Yi Yang; Todor Davchev; Yusuf; Aytar; Guangyao Zhou; Raia Hadsell; Lourdes Agapito; Jon Scholz

arXiv:2308.15975·cs.RO·September 1, 2023

RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation

Mel Vecerik, Carl Doersch, Yi Yang, Todor Davchev, Yusuf, Aytar, Guangyao Zhou, Raia Hadsell, Lourdes Agapito, Jon Scholz

PDF

Open Access 1 Repo

TL;DR

RoboTAP introduces a dense tracking method using Track-Any-Point models to enable robots to quickly learn complex tasks from minimal demonstrations, improving generality and data efficiency.

Contribution

The paper presents a novel approach combining dense tracking with low-level control to facilitate rapid, generalizable robot learning from demonstrations without task-specific engineering.

Findings

01

Robust policies for shape-matching and stacking achieved from minutes-long demonstrations.

02

Effective path-following tasks like applying glue demonstrated.

03

Method outperforms previous approaches in data efficiency and task generality.

Abstract

For robots to be useful outside labs and specialized factories we need a way to teach them new useful behaviors quickly. Current approaches lack either the generality to onboard new tasks without task-specific engineering, or else lack the data-efficiency to do so in an amount of time that enables practical use. In this work we explore dense tracking as a representational vehicle to allow faster and more general learning from demonstration. Our approach utilizes Track-Any-Point (TAP) models to isolate the relevant motion in a demonstration, and parameterize a low-level controller to reproduce this motion across changes in the scene configuration. We show this results in robust robot policies that can solve complex object-arrangement tasks such as shape-matching, stacking, and even full path-following tasks such as applying glue and sticking objects together, all from demonstrations that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-deepmind/tapnet
jax

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Human Pose and Action Recognition