Any-point Trajectory Modeling for Policy Learning

Chuan Wen; Xingyu Lin; John So; Kai Chen; Qi Dou; Yang Gao; Pieter; Abbeel

arXiv:2401.00025·cs.RO·July 15, 2024·1 cites

Any-point Trajectory Modeling for Policy Learning

Chuan Wen, Xingyu Lin, John So, Kai Chen, Qi Dou, Yang Gao, Pieter, Abbeel

PDF

Open Access 1 Repo

TL;DR

This paper introduces Any-point Trajectory Modeling (ATM), a framework that leverages video demonstrations to improve robot policy learning by predicting future trajectories of arbitrary points, significantly reducing the need for labeled data.

Contribution

The novel ATM framework enables control guidance from videos by pre-training a trajectory model, facilitating robust visuomotor policy learning with minimal labeled data.

Findings

01

ATM outperforms video pre-training baselines by 80% on average across tasks.

02

Effective transfer of manipulation skills from human and cross-robot videos.

03

Successful application in over 130 language-conditioned tasks in simulation and real world.

Abstract

Learning from demonstration is a powerful method for teaching robots new skills, and having more demonstration data often improves policy learning. However, the high cost of collecting demonstration data is a significant bottleneck. Videos, as a rich data source, contain knowledge of behaviors, physics, and semantics, but extracting control-specific information from them is challenging due to the lack of action labels. In this work, we introduce a novel framework, Any-point Trajectory Modeling (ATM), that utilizes video demonstrations by pre-training a trajectory model to predict future trajectories of arbitrary points within a video frame. Once trained, these trajectories provide detailed control guidance, enabling the learning of robust visuomotor policies with minimal action-labeled data. Across over 130 language-conditioned tasks we evaluated in both simulation and the real world,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

large-trajectory-model/atm
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Human Pose and Action Recognition · Multimodal Machine Learning Applications