ZeroWBC: Learning Natural Visuomotor Humanoid Control Directly from Human Egocentric Video
Haoran Yang, Jiacheng Bao, Yucheng Xin, Haoming Song, Yuyang Tian, Bin Zhao, Dong Wang, Xuelong Li

TL;DR
ZeroWBC introduces a framework that learns natural humanoid robot control directly from human egocentric videos, removing the need for costly teleoperation data and enabling versatile, natural scene-interaction behaviors.
Contribution
It presents a novel approach that fine-tunes a vision-language model for predicting human motions from egocentric videos and retargets these motions for humanoid control, bypassing traditional teleoperation data requirements.
Findings
Outperforms baseline methods in motion naturalness and versatility
Successfully controls humanoid robots for natural scene interactions
Eliminates teleoperation data collection overhead
Abstract
Achieving versatile and naturalistic whole-body control for humanoid robot scene-interaction remains a significant challenge. While some recent works have demonstrated autonomous humanoid interactive control, they are constrained to rigid locomotion patterns and expensive teleoperation data collection, lacking the versatility to execute more human-like natural behaviors such as sitting or kicking. Furthermore, acquiring the necessary real robot teleoperation data is prohibitively expensive and time-consuming. To address these limitations, we introduce ZeroWBC, a novel framework that learns a natural humanoid visuomotor control policy directly from human egocentric videos, eliminating the need for large-scale robot teleoperation data and enabling natural humanoid robot scene-interaction control. Specifically, our approach first fine-tunes a Vision-Language Model (VLM) to predict future…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Robot Manipulation and Learning
