Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos
Hao Luo, Yicheng Feng, Wanpeng Zhang, Sipeng Zheng, Ye Wang, Haoqi Yuan, Jiazheng Liu, Chaoyi Xu, Qin Jin, Zongqing Lu

TL;DR
Being-H0 is a large-scale vision-language-action model trained on human videos, enabling precise hand motion generation and improved robotic manipulation through physical instruction tuning and diverse data integration.
Contribution
It introduces a novel training paradigm combining large-scale human video pretraining, physical space alignment, and part-level motion tokenization for dexterous manipulation.
Findings
Superior hand motion generation accuracy
Effective instruction following capabilities
Scalable performance with larger models and data
Abstract
We introduce Being-H0, a dexterous Vision-Language-Action model (VLA) trained on large-scale human videos. Existing VLAs struggle with complex manipulation tasks requiring high dexterity and generalize poorly to novel scenarios and tasks, primarily due to their reliance on synthetic data with significant sim-to-real gaps or teleoperated demonstrations lacking scale and diversity. To address this data bottleneck, we propose leveraging human hands as a foundation manipulator, capitalizing on the rich dexterity and scalability present in web data. Our approach centers on physical instruction tuning, a novel training paradigm that combines large-scale VLA pretraining from human videos, physical space alignment for 3D reasoning, and post-training adaptation for robotic tasks. Additionally, we introduce a part-level motion tokenization method which achieves millimeter-level reconstruction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗BeingBeyond/Being-H05-2Bmodel· 113 dl· ♡ 13113 dl♡ 13
- 🤗BeingBeyond/Being-H0model· ♡ 11♡ 11
- 🤗BeingBeyond/Being-H0-8B-2508model· 2 dl· ♡ 32 dl♡ 3
- 🤗BeingBeyond/Being-H0-1B-2508model· 4 dl· ♡ 44 dl♡ 4
- 🤗BeingBeyond/Being-H0-14B-2508model· 3 dl· ♡ 33 dl♡ 3
- 🤗BeingBeyond/Being-H0-8B-Align-2508model· 2 dl· ♡ 12 dl♡ 1
- 🤗BeingBeyond/Being-H0-GRVQ-8K-2508model· ♡ 2♡ 2
- 🤗BeingBeyond/Being-H05-2B_liberomodel· 23 dl· ♡ 323 dl♡ 3
- 🤗BeingBeyond/Being-H05-2B_robocasamodel· 34 dl· ♡ 334 dl♡ 3
- 🤗BeingBeyond/Being-H05-2B_libero_robocasamodel· 11 dl· ♡ 511 dl♡ 5
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
