GigaBrain-0: A World Model-Powered Vision-Language-Action Model
GigaBrain Team: Angen Ye, Boyuan Wang, Chaojun Ni, Guan Huang, Guosheng Zhao, Haoyun Li, Jie Li, Jiagang Zhu, Lv Feng, Peng Li, Qiuping Deng, Runqi Ouyang, Wenkang Qin, Xinze Chen, Xiaofeng Wang, Yang Wang, Yifan Li, Yilong Li, Yiran Ding, Yuan Xu, Yun Ye, Yukun Zhou

TL;DR
GigaBrain-0 is a versatile vision-language-action model for robots that leverages world model-generated data to enhance generalization, reduce real-world data needs, and improve robustness in complex manipulation tasks.
Contribution
The paper introduces GigaBrain-0, a novel foundation model that uses synthetic data from world models to improve robot task generalization and robustness, reducing dependence on real data.
Findings
Outperforms existing models in generalization across appearance and viewpoint variations.
Achieves superior real-world performance on dexterous and long-horizon tasks.
Includes a lightweight version suitable for edge devices like NVIDIA Jetson.
Abstract
Training Vision-Language-Action (VLA) models for generalist robots typically requires large-scale real-world robot data, which is expensive and time-consuming to collect. The inefficiency of physical data collection severely limits the scalability, and generalization capacity of current VLA systems. To address this challenge, we introduce GigaBrain-0, a novel VLA foundation model empowered by world model-generated data (e.g., video generation, real2real transfer, human transfer, view transfer, sim2real transfer data). By leveraging world models to generate diverse data at scale, GigaBrain-0 significantly reduces reliance on real robot data while improving cross-task generalization. Our approach further improves policy robustness through RGBD input modeling and embodied Chain-of-Thought (CoT) supervision, enabling the model to reason about spatial geometry, object states, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Robot Manipulation and Learning
