Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy Transfer
Haoru Xue, Tairan He, Zi Wang, Qingwei Ben, Wenli Xiao, Zhengyi Luo, Xingye Da, Fernando Casta\~neda, Guanya Shi, Shankar Sastry, Linxi "Jim" Fan, Yuke Zhu

TL;DR
This paper presents a novel simulation-to-real transfer framework for humanoid robots that enables robust, zero-shot articulated door manipulation using only RGB input, outperforming humans in task completion time.
Contribution
It introduces a staged exploration and fine-tuning approach for vision-based humanoid control, enabling effective sim-to-real transfer in complex loco-manipulation tasks.
Findings
Zero-shot sim-to-real transfer achieved for humanoid door manipulation.
Policy outperforms human teleoperators in task completion time.
First humanoid policy using only RGB perception for diverse articulated tasks.
Abstract
Recent progress in GPU-accelerated, photorealistic simulation has opened a scalable data-generation path for robot learning, where massive physics and visual randomization allow policies to generalize beyond curated environments. Building on these advances, we develop a teacher-student-bootstrap learning framework for vision-based humanoid loco-manipulation, using articulated-object interaction as a representative high-difficulty benchmark. Our approach introduces a staged-reset exploration strategy that stabilizes long-horizon privileged-policy training, and a GRPO-based fine-tuning procedure that mitigates partial observability and improves closed-loop consistency in sim-to-real RL. Trained entirely on simulation data, the resulting policy achieves robust zero-shot performance across diverse door types and outperforms human teleoperators by up to 31.7% in task completion time under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Human Motion and Animation · Human Pose and Action Recognition
