Dexterous Manipulation Policies from RGB Human Videos via 3D Hand-Object Trajectory Reconstruction
Hongyi Chen, Tony Dong, Tiancheng Wu, Liquan Wang, Yash Jangir, Yaru Niu, Yufei Ye, Homanga Bharadhwaj, Zackory Erickson, Jeffrey Ichnowski

TL;DR
This paper introduces VIDEOMANIP, a novel framework that learns dexterous robotic manipulation directly from RGB videos of humans performing tasks, eliminating the need for specialized sensors or large datasets.
Contribution
The work presents a new device-free approach that reconstructs 3D hand-object trajectories from monocular videos and trains manipulation policies without additional robot demonstrations.
Findings
Achieves 70.25% success rate in simulation across 20 objects.
Attains 62.86% success rate in real-world tasks, outperforming retargeting methods.
Enables generalizable manipulation policy learning from a single RGB video.
Abstract
Multi-finger robotic hand manipulation and grasping are challenging due to the high-dimensional action space and the difficulty of acquiring large-scale training data. Existing approaches largely rely on human teleoperation with wearable devices or specialized sensing equipment to capture hand-object interactions, which limits scalability. In this work, we propose VIDEOMANIP, a device-free framework that learns dexterous manipulation directly from RGB human videos. Leveraging recent advances in computer vision, VIDEOMANIP reconstructs explicit 3D robot-object trajectories from monocular videos by estimating human hand poses, object meshes, and retargets the reconstructed human motions to robotic hands for manipulation learning. To make the reconstructed robot data suitable for dexterous manipulation training, we introduce hand-object contact optimization with interaction-centric grasp…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Hand Gesture Recognition Systems
