Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization
Hao Luo, Ye Wang, Wanpeng Zhang, Sipeng Zheng, Ziheng Xi, Chaoyi Xu, Haiweng Xu, Haoqi Yuan, Chi Zhang, Yiqing Wang, Yicheng Feng, Zongqing Lu

TL;DR
Being-H0.5 is a novel vision-language-action model that enables robots with different morphologies to learn from human data and generalize across diverse platforms using a unified action space and advanced modeling techniques.
Contribution
The paper introduces a human-centric learning paradigm, a large-scale multimodal dataset, and a Mixture-of-Transformers architecture with novel gating mechanisms for cross-embodiment robot generalization.
Findings
Achieves state-of-the-art on LIBERO and RoboCasa benchmarks.
Demonstrates strong cross-embodiment generalization on five robotic platforms.
Introduces a unified action space and novel gating techniques for robustness.
Abstract
We introduce Being-H0.5, a foundational Vision-Language-Action (VLA) model designed for robust cross-embodiment generalization across diverse robotic platforms. While existing VLAs often struggle with morphological heterogeneity and data scarcity, we propose a human-centric learning paradigm that treats human interaction traces as a universal "mother tongue" for physical interaction. To support this, we present UniHand-2.0, the largest embodied pre-training recipe to date, comprising over 35,000 hours of multimodal data across 30 distinct robotic embodiments. Our approach introduces a Unified Action Space that maps heterogeneous robot controls into semantically aligned slots, enabling low-resource robots to bootstrap skills from human data and high-resource platforms. Built upon this human-centric foundation, we design a unified sequential modeling and multi-task pre-training paradigm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Robot Manipulation and Learning
