HoloBrain-0 Technical Report
Xuewu Lin, Tianwei Lin, Yun Du, Hongyu Xie, Yiwei Jin, Jiawei Li, Shijie Wu, Qingze Wang, Mengdi Li, Mengao Zhao, Ziang Li, Chaodong Huang, Hongzhe Bi, Lichao Huang, Zhizhong Su

TL;DR
HoloBrain-0 introduces a novel vision-language-action framework for robots that incorporates embodiment priors, achieves state-of-the-art results, and is fully open-sourced to facilitate research and deployment.
Contribution
The paper presents HoloBrain-0, a new VLA architecture that explicitly models robot embodiment and demonstrates scalable pre-training and deployment capabilities.
Findings
State-of-the-art results on RoboTwin 2.0, LIBERO, and GenieSim benchmarks.
Efficient 0.2B-parameter model rivals larger baselines.
Open-source ecosystem supports research and practical deployment.
Abstract
In this work, we introduce HoloBrain-0, a comprehensive Vision-Language-Action (VLA) framework that bridges the gap between foundation model research and reliable real-world robot deployment. The core of our system is a novel VLA architecture that explicitly incorporates robot embodiment priors, including multi-view camera parameters and kinematic descriptions (URDF), to enhance 3D spatial reasoning and support diverse embodiments. We validate this design through a scalable ``pre-train then post-train" paradigm, achieving state-of-the-art results on simulation benchmarks such as RoboTwin 2.0, LIBERO, and GenieSim, as well as strong results on challenging long-horizon real-world manipulation tasks. Notably, our efficient 0.2B-parameter variant rivals significantly larger baselines, enabling low-latency on-device deployment. To further accelerate research and practical adoption, we fully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Social Robot Interaction and HRI
