From Generated Human Videos to Physically Plausible Robot Trajectories

James Ni; Zekai Wang; Wei Lin; Amir Bar; Yann LeCun; Trevor Darrell; Jitendra Malik; Roei Herzig

arXiv:2512.05094·cs.RO·December 12, 2025

From Generated Human Videos to Physically Plausible Robot Trajectories

James Ni, Zekai Wang, Wei Lin, Amir Bar, Yann LeCun, Trevor Darrell, Jitendra Malik, Roei Herzig

PDF

Open Access 1 Datasets

TL;DR

This paper presents a novel pipeline that converts generated human videos into physically plausible robot trajectories, enabling zero-shot imitation of human actions by robots using a new benchmark and reinforcement learning techniques.

Contribution

It introduces a two-stage process for lifting videos to 3D representations and retargeting to robots, along with a physics-aware RL policy and a new benchmark for zero-shot generalization.

Findings

01

Improved simulation performance over baselines

02

Physically stable motion tracking on a humanoid robot

03

Effective zero-shot imitation from noisy generated videos

Abstract

Video generation models are rapidly improving in their ability to synthesize human actions in novel contexts, holding the potential to serve as high-level planners for contextual robot control. To realize this potential, a key research question remains open: how can a humanoid execute the human actions from generated videos in a zero-shot manner? This challenge arises because generated videos are often noisy and exhibit morphological distortions that make direct imitation difficult compared to real video. To address this, we introduce a two-stage pipeline. First, we lift video pixels into a 4D human representation and then retarget to the humanoid morphology. Second, we propose GenMimic-a physics-aware reinforcement learning policy conditioned on 3D keypoints, and trained with symmetry regularization and keypoint-weighted tracking rewards. As a result, GenMimic can mimic human actions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

wlin21at/GenMimicBench
dataset· 23 dl
23 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis