Any-point Trajectory Modeling for Policy Learning
Chuan Wen, Xingyu Lin, John So, Kai Chen, Qi Dou, Yang Gao, Pieter, Abbeel

TL;DR
This paper introduces Any-point Trajectory Modeling (ATM), a framework that leverages video demonstrations to improve robot policy learning by predicting future trajectories of arbitrary points, significantly reducing the need for labeled data.
Contribution
The novel ATM framework enables control guidance from videos by pre-training a trajectory model, facilitating robust visuomotor policy learning with minimal labeled data.
Findings
ATM outperforms video pre-training baselines by 80% on average across tasks.
Effective transfer of manipulation skills from human and cross-robot videos.
Successful application in over 130 language-conditioned tasks in simulation and real world.
Abstract
Learning from demonstration is a powerful method for teaching robots new skills, and having more demonstration data often improves policy learning. However, the high cost of collecting demonstration data is a significant bottleneck. Videos, as a rich data source, contain knowledge of behaviors, physics, and semantics, but extracting control-specific information from them is challenging due to the lack of action labels. In this work, we introduce a novel framework, Any-point Trajectory Modeling (ATM), that utilizes video demonstrations by pre-training a trajectory model to predict future trajectories of arbitrary points within a video frame. Once trained, these trajectories provide detailed control guidance, enabling the learning of robust visuomotor policies with minimal action-labeled data. Across over 130 language-conditioned tasks we evaluated in both simulation and the real world,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Human Pose and Action Recognition · Multimodal Machine Learning Applications
