TL;DR
OmniHumanoid is a novel framework for cross-embodiment video generation that separates motion transfer from embodiment adaptation, enabling scalable, high-fidelity motion synthesis across diverse humanoid robots using unpaired data.
Contribution
It introduces a factorized approach with shared motion models and lightweight embodiment-specific adapters, along with a new synthetic dataset for cross-embodiment learning.
Findings
Achieves high motion fidelity and embodiment consistency.
Enables adaptation to new embodiments without retraining the shared model.
Performs well on synthetic and real-world benchmarks.
Abstract
Cross-embodiment video generation aims to transfer motions across different humanoid embodiments, such as human-to-robot and robot-to-robot, enabling scalable data generation for embodied intelligence. A major challenge in this setting is that motion dynamics are partly transferable across embodiments, whereas appearance and morphology remain embodiment-specific. Existing approaches often entangle these factors, and many require paired data for every target embodiment, which limits scalability to new robots. We present OmniHumanoid, a framework that factorizes transferable motion learning and embodiment-specific adaptation. Our method learns a shared motion transfer model from motion-aligned paired videos spanning multiple embodiments, while adapting to a new embodiment using only unpaired videos through lightweight embodiment-specific adapters. To reduce interference between motion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
