Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization

Hao Luo; Ye Wang; Wanpeng Zhang; Sipeng Zheng; Ziheng Xi; Chaoyi Xu; Haiweng Xu; Haoqi Yuan; Chi Zhang; Yiqing Wang; Yicheng Feng; Zongqing Lu

arXiv:2601.12993·cs.RO·January 21, 2026

Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization

Hao Luo, Ye Wang, Wanpeng Zhang, Sipeng Zheng, Ziheng Xi, Chaoyi Xu, Haiweng Xu, Haoqi Yuan, Chi Zhang, Yiqing Wang, Yicheng Feng, Zongqing Lu

PDF

Open Access 4 Models 5 Datasets

TL;DR

Being-H0.5 is a novel vision-language-action model that enables robots with different morphologies to learn from human data and generalize across diverse platforms using a unified action space and advanced modeling techniques.

Contribution

The paper introduces a human-centric learning paradigm, a large-scale multimodal dataset, and a Mixture-of-Transformers architecture with novel gating mechanisms for cross-embodiment robot generalization.

Findings

01

Achieves state-of-the-art on LIBERO and RoboCasa benchmarks.

02

Demonstrates strong cross-embodiment generalization on five robotic platforms.

03

Introduces a unified action space and novel gating techniques for robustness.

Abstract

We introduce Being-H0.5, a foundational Vision-Language-Action (VLA) model designed for robust cross-embodiment generalization across diverse robotic platforms. While existing VLAs often struggle with morphological heterogeneity and data scarcity, we propose a human-centric learning paradigm that treats human interaction traces as a universal "mother tongue" for physical interaction. To support this, we present UniHand-2.0, the largest embodied pre-training recipe to date, comprising over 35,000 hours of multimodal data across 30 distinct robotic embodiments. Our approach introduces a Unified Action Space that maps heterogeneous robot controls into semantically aligned slots, enabling low-resource robots to bootstrap skills from human data and high-resource platforms. Built upon this human-centric foundation, we design a unified sequential modeling and multi-task pre-training paradigm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Robot Manipulation and Learning