Track2Act: Predicting Point Tracks from Internet Videos enables   Generalizable Robot Manipulation

Homanga Bharadhwaj; Roozbeh Mottaghi; Abhinav Gupta; Shubham Tulsiani

arXiv:2405.01527·cs.RO·August 12, 2024

Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation

Homanga Bharadhwaj, Roozbeh Mottaghi, Abhinav Gupta, Shubham Tulsiani

PDF

Open Access

TL;DR

Track2Act leverages web videos to predict point tracks and infer manipulation plans, enabling zero-shot generalizable robot manipulation across unseen objects and scenes with minimal robot-specific data.

Contribution

The paper introduces a novel framework that predicts point tracks from web videos to generate manipulation plans, reducing reliance on large demonstration datasets.

Findings

01

Enables zero-shot manipulation of unseen objects and scenes.

02

Combines web video-based predictions with minimal robot demonstrations.

03

Achieves diverse real-world manipulation tasks with minimal in-domain data.

Abstract

We seek to learn a generalizable goal-conditioned policy that enables zero-shot robot manipulation: interacting with unseen objects in novel scenes without test-time adaptation. While typical approaches rely on a large amount of demonstration data for such generalization, we propose an approach that leverages web videos to predict plausible interaction plans and learns a task-agnostic transformation to obtain robot actions in the real world. Our framework,Track2Act predicts tracks of how points in an image should move in future time-steps based on a goal, and can be trained with diverse videos on the web including those of humans and robots manipulating everyday objects. We use these 2D track predictions to infer a sequence of rigid transforms of the object to be manipulated, and obtain robot end-effector poses that can be executed in an open-loop manner. We then refine this open-loop…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Adversarial Robustness in Machine Learning · Human Pose and Action Recognition