3PoinTr: 3D Point Tracks for Robot Manipulation Pretraining from Casual Videos

Adam Hung; Bardienus Pieter Duisterhof; Jeffrey Ichnowski

arXiv:2603.08485·cs.RO·March 10, 2026

3PoinTr: 3D Point Tracks for Robot Manipulation Pretraining from Casual Videos

Adam Hung, Bardienus Pieter Duisterhof, Jeffrey Ichnowski

PDF

Open Access

TL;DR

3PoinTr introduces a transformer-based method for pretraining robot manipulation policies from casual human videos by predicting 3D point tracks, enabling robust generalization with minimal demonstrations.

Contribution

The paper presents a novel approach using 3D point track prediction with a transformer architecture for embodiment-agnostic robot policy pretraining from unconstrained human videos.

Findings

01

Achieves robust spatial generalization with only 20 demonstrations.

02

Outperforms existing behavior cloning and pretraining methods.

03

Produces more accurate 3D point tracks than baseline models.

Abstract

Data-efficient training of robust robot policies is the key to unlocking automation in a wide array of novel tasks. Current systems require large volumes of demonstrations to achieve robustness, which is impractical in many applications. Learning policies directly from human videos is a promising alternative that removes teleoperation costs, but it shifts the challenge toward overcoming the embodiment gap (differences in kinematics and strategies between robots and humans), often requiring restrictive and carefully choreographed human motions. We propose 3PoinTr, a method for pretraining robot policies from casual and unconstrained human videos, enabling learning from motions natural for humans. 3PoinTr uses a transformer architecture to predict 3D point tracks as an intermediate embodiment-agnostic representation. 3D point tracks encode goal specifications, scene geometry, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Social Robot Interaction and HRI