GigaBrain-0: A World Model-Powered Vision-Language-Action Model

GigaBrain Team: Angen Ye; Boyuan Wang; Chaojun Ni; Guan Huang; Guosheng Zhao; Haoyun Li; Jie Li; Jiagang Zhu; Lv Feng; Peng Li; Qiuping Deng; Runqi Ouyang; Wenkang Qin; Xinze Chen; Xiaofeng Wang; Yang Wang; Yifan Li; Yilong Li; Yiran Ding; Yuan Xu; Yun Ye; Yukun Zhou; Zhehao Dong; Zhenan Wang; Zhichao Liu; Zheng Zhu

arXiv:2510.19430·cs.RO·December 5, 2025

GigaBrain-0: A World Model-Powered Vision-Language-Action Model

GigaBrain Team: Angen Ye, Boyuan Wang, Chaojun Ni, Guan Huang, Guosheng Zhao, Haoyun Li, Jie Li, Jiagang Zhu, Lv Feng, Peng Li, Qiuping Deng, Runqi Ouyang, Wenkang Qin, Xinze Chen, Xiaofeng Wang, Yang Wang, Yifan Li, Yilong Li, Yiran Ding, Yuan Xu, Yun Ye, Yukun Zhou

PDF

Open Access 2 Models

TL;DR

GigaBrain-0 is a versatile vision-language-action model for robots that leverages world model-generated data to enhance generalization, reduce real-world data needs, and improve robustness in complex manipulation tasks.

Contribution

The paper introduces GigaBrain-0, a novel foundation model that uses synthetic data from world models to improve robot task generalization and robustness, reducing dependence on real data.

Findings

01

Outperforms existing models in generalization across appearance and viewpoint variations.

02

Achieves superior real-world performance on dexterous and long-horizon tasks.

03

Includes a lightweight version suitable for edge devices like NVIDIA Jetson.

Abstract

Training Vision-Language-Action (VLA) models for generalist robots typically requires large-scale real-world robot data, which is expensive and time-consuming to collect. The inefficiency of physical data collection severely limits the scalability, and generalization capacity of current VLA systems. To address this challenge, we introduce GigaBrain-0, a novel VLA foundation model empowered by world model-generated data (e.g., video generation, real2real transfer, human transfer, view transfer, sim2real transfer data). By leveraging world models to generate diverse data at scale, GigaBrain-0 significantly reduces reliance on real robot data while improving cross-task generalization. Our approach further improves policy robustness through RGBD input modeling and embodied Chain-of-Thought (CoT) supervision, enabling the model to reason about spatial geometry, object states, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Robot Manipulation and Learning