$\Psi_0$: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation
Songlin Wei, Hongyi Jing, Boqian Li, Zhenyu Zhao, Jiageng Mao, Zhenhao Ni, Sicheng He, Jie Liu, Xiawei Liu, Kaidi Kang, Sheng Zang, Weiduo Yuan, Marco Pavone, Di Huang, Yue Wang

TL;DR
$ ext{Psi}_0$ is an open humanoid foundation model trained through staged learning on high-quality human videos and robot data, achieving superior loco-manipulation performance with less data than previous methods.
Contribution
The paper introduces $ ext{Psi}_0$, a novel staged training paradigm that decouples learning from heterogeneous data sources for humanoid loco-manipulation.
Findings
Achieves over 40 ext% improvement in success rate over baselines.
Uses only 800 hours of human videos and 30 hours of robot data.
Outperforms models trained on ten times more data.
Abstract
We introduce (Psi-Zero), an open foundation model to address challenging humanoid loco-manipulation tasks. While existing approaches often attempt to address this fundamental problem by co-training on large and diverse human and humanoid data, we argue that this strategy is suboptimal due to the fundamental kinematic and motion disparities between humans and humanoid robots. Therefore, data efficiency and model performance remain unsatisfactory despite the considerable data volume. To address this challenge, \ours\;decouples the learning process to maximize the utility of heterogeneous data sources. Specifically, we propose a staged training paradigm with different learning objectives: First, we autoregressively pre-train a VLM backbone on large-scale egocentric human videos to acquire generalizable visual-action representations. Then, we post-train a flow-based action expert…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Robot Manipulation and Learning
