World-Coordinate Human Motion Retargeting via SAM 3D Body
Zhangzheng Tu, Kailun Su, Shaolong Zhu, Yukun Zheng

TL;DR
This paper introduces a lightweight framework for recovering and retargeting human motion from monocular videos to humanoid robots, using SAM 3D Body and a novel optimization approach for stable, physically plausible motion reconstruction.
Contribution
It presents a new method combining SAM 3D Body with a low-dimensional human representation and physical constraints for effective motion retargeting without complex pipelines.
Findings
Stable world trajectories achieved on real videos
Reliable robot retargeting demonstrated on Unitree G1
Efficient sliding-window optimization improves temporal consistency
Abstract
Recovering world-coordinate human motion from monocular videos with humanoid robot retargeting is significant for embodied intelligence and robotics. To avoid complex SLAM pipelines or heavy temporal models, we propose a lightweight, engineering-oriented framework that leverages SAM 3D Body (3DB) as a frozen perception backbone and uses the Momentum HumanRig (MHR) representation as a robot-friendly intermediate. Our method (i) locks the identity and skeleton-scale parameters of per tracked subject to enforce temporally consistent bone lengths, (ii) smooths per-frame predictions via efficient sliding-window optimization in the low-dimensional MHR latent space, and (iii) recovers physically plausible global root trajectories with a differentiable soft foot-ground contact model and contact-aware global optimization. Finally, we retarget the reconstructed motion to the Unitree G1 humanoid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Human Pose and Action Recognition · Robot Manipulation and Learning
