X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real

Prithwish Dan; Kushal Kedia; Angela Chao; Edward Weiyi Duan; Maximus Adrian Pace; Wei-Chiu Ma; Sanjiban Choudhury

arXiv:2505.07096·cs.RO·November 11, 2025

X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real

Prithwish Dan, Kushal Kedia, Angela Chao, Edward Weiyi Duan, Maximus Adrian Pace, Wei-Chiu Ma, Sanjiban Choudhury

PDF

Open Access

TL;DR

X-Sim introduces a novel cross-embodiment learning framework that leverages object motion from human videos to train and transfer robot manipulation policies without requiring robot teleoperation data.

Contribution

It proposes a real-to-sim-to-real approach using object-centric rewards, synthetic diffusion policies, and online domain adaptation for effective robot policy learning from human videos.

Findings

01

Improves task progress by 30% over baselines

02

Matches behavior cloning with 10x less data

03

Generalizes to new viewpoints and test conditions

Abstract

Human videos offer a scalable way to train robot manipulation policies, but lack the action labels needed by standard imitation learning algorithms. Existing cross-embodiment approaches try to map human motion to robot actions, but often fail when the embodiments differ significantly. We propose X-Sim, a real-to-sim-to-real framework that uses object motion as a dense and transferable signal for learning robot policies. X-Sim starts by reconstructing a photorealistic simulation from an RGBD human video and tracking object trajectories to define object-centric rewards. These rewards are used to train a reinforcement learning (RL) policy in simulation. The learned policy is then distilled into an image-conditioned diffusion policy using synthetic rollouts rendered with varied viewpoints and lighting. To transfer to the real world, X-Sim introduces an online domain adaptation technique…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation

MethodsDiffusion