HoloBrain-0 Technical Report

Xuewu Lin; Tianwei Lin; Yun Du; Hongyu Xie; Yiwei Jin; Jiawei Li; Shijie Wu; Qingze Wang; Mengdi Li; Mengao Zhao; Ziang Li; Chaodong Huang; Hongzhe Bi; Lichao Huang; Zhizhong Su

arXiv:2602.12062·cs.RO·February 13, 2026

HoloBrain-0 Technical Report

Xuewu Lin, Tianwei Lin, Yun Du, Hongyu Xie, Yiwei Jin, Jiawei Li, Shijie Wu, Qingze Wang, Mengdi Li, Mengao Zhao, Ziang Li, Chaodong Huang, Hongzhe Bi, Lichao Huang, Zhizhong Su

PDF

Open Access 2 Models

TL;DR

HoloBrain-0 introduces a novel vision-language-action framework for robots that incorporates embodiment priors, achieves state-of-the-art results, and is fully open-sourced to facilitate research and deployment.

Contribution

The paper presents HoloBrain-0, a new VLA architecture that explicitly models robot embodiment and demonstrates scalable pre-training and deployment capabilities.

Findings

01

State-of-the-art results on RoboTwin 2.0, LIBERO, and GenieSim benchmarks.

02

Efficient 0.2B-parameter model rivals larger baselines.

03

Open-source ecosystem supports research and practical deployment.

Abstract

In this work, we introduce HoloBrain-0, a comprehensive Vision-Language-Action (VLA) framework that bridges the gap between foundation model research and reliable real-world robot deployment. The core of our system is a novel VLA architecture that explicitly incorporates robot embodiment priors, including multi-view camera parameters and kinematic descriptions (URDF), to enhance 3D spatial reasoning and support diverse embodiments. We validate this design through a scalable ``pre-train then post-train" paradigm, achieving state-of-the-art results on simulation benchmarks such as RoboTwin 2.0, LIBERO, and GenieSim, as well as strong results on challenging long-horizon real-world manipulation tasks. Notably, our efficient 0.2B-parameter variant rivals significantly larger baselines, enabling low-latency on-device deployment. To further accelerate research and practical adoption, we fully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Social Robot Interaction and HRI