Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy Transfer

Haoru Xue; Tairan He; Zi Wang; Qingwei Ben; Wenli Xiao; Zhengyi Luo; Xingye Da; Fernando Casta\~neda; Guanya Shi; Shankar Sastry; Linxi "Jim" Fan; Yuke Zhu

arXiv:2512.01061·cs.RO·December 2, 2025

Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy Transfer

Haoru Xue, Tairan He, Zi Wang, Qingwei Ben, Wenli Xiao, Zhengyi Luo, Xingye Da, Fernando Casta\~neda, Guanya Shi, Shankar Sastry, Linxi "Jim" Fan, Yuke Zhu

PDF

Open Access

TL;DR

This paper presents a novel simulation-to-real transfer framework for humanoid robots that enables robust, zero-shot articulated door manipulation using only RGB input, outperforming humans in task completion time.

Contribution

It introduces a staged exploration and fine-tuning approach for vision-based humanoid control, enabling effective sim-to-real transfer in complex loco-manipulation tasks.

Findings

01

Zero-shot sim-to-real transfer achieved for humanoid door manipulation.

02

Policy outperforms human teleoperators in task completion time.

03

First humanoid policy using only RGB perception for diverse articulated tasks.

Abstract

Recent progress in GPU-accelerated, photorealistic simulation has opened a scalable data-generation path for robot learning, where massive physics and visual randomization allow policies to generalize beyond curated environments. Building on these advances, we develop a teacher-student-bootstrap learning framework for vision-based humanoid loco-manipulation, using articulated-object interaction as a representative high-difficulty benchmark. Our approach introduces a staged-reset exploration strategy that stabilizes long-horizon privileged-policy training, and a GRPO-based fine-tuning procedure that mitigates partial observability and improves closed-loop consistency in sim-to-real RL. Trained entirely on simulation data, the resulting policy achieves robust zero-shot performance across diverse door types and outperforms human teleoperators by up to 31.7% in task completion time under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Motion and Animation · Human Pose and Action Recognition