$\Psi_0$: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation

Songlin Wei; Hongyi Jing; Boqian Li; Zhenyu Zhao; Jiageng Mao; Zhenhao Ni; Sicheng He; Jie Liu; Xiawei Liu; Kaidi Kang; Sheng Zang; Weiduo Yuan; Marco Pavone; Di Huang; Yue Wang

arXiv:2603.12263·cs.RO·March 13, 2026

$\Psi_0$: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation

Songlin Wei, Hongyi Jing, Boqian Li, Zhenyu Zhao, Jiageng Mao, Zhenhao Ni, Sicheng He, Jie Liu, Xiawei Liu, Kaidi Kang, Sheng Zang, Weiduo Yuan, Marco Pavone, Di Huang, Yue Wang

PDF

Open Access

TL;DR

$ ext{Psi}_0$ is an open humanoid foundation model trained through staged learning on high-quality human videos and robot data, achieving superior loco-manipulation performance with less data than previous methods.

Contribution

The paper introduces $ ext{Psi}_0$, a novel staged training paradigm that decouples learning from heterogeneous data sources for humanoid loco-manipulation.

Findings

01

Achieves over 40 ext% improvement in success rate over baselines.

02

Uses only 800 hours of human videos and 30 hours of robot data.

03

Outperforms models trained on ten times more data.

Abstract

We introduce $Ψ_{0}$ (Psi-Zero), an open foundation model to address challenging humanoid loco-manipulation tasks. While existing approaches often attempt to address this fundamental problem by co-training on large and diverse human and humanoid data, we argue that this strategy is suboptimal due to the fundamental kinematic and motion disparities between humans and humanoid robots. Therefore, data efficiency and model performance remain unsatisfactory despite the considerable data volume. To address this challenge, \ours\;decouples the learning process to maximize the utility of heterogeneous data sources. Specifically, we propose a staged training paradigm with different learning objectives: First, we autoregressively pre-train a VLM backbone on large-scale egocentric human videos to acquire generalizable visual-action representations. Then, we post-train a flow-based action expert…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Robot Manipulation and Learning