ACE-Brain-0: Spatial Intelligence as a Shared Scaffold for Universal Embodiments

Ziyang Gong; Zehang Luo; Anke Tang; Zhe Liu; Shi Fu; Zhi Hou; Ganlin Yang; Weiyun Wang; Xiaofeng Wang; Jianbo Liu; Gen Luo; Haolan Kang; Shuang Luo; Yue Zhou; Yong Luo; Li Shen; Xiaosong Jia; Yao Mu; Xue Yang; Chunxiao Liu; Junchi Yan; Hengshuang Zhao; Dacheng Tao; Xiaogang Wang

arXiv:2603.03198·cs.RO·March 4, 2026

ACE-Brain-0: Spatial Intelligence as a Shared Scaffold for Universal Embodiments

Ziyang Gong, Zehang Luo, Anke Tang, Zhe Liu, Shi Fu, Zhi Hou, Ganlin Yang, Weiyun Wang, Xiaofeng Wang, Jianbo Liu, Gen Luo, Haolan Kang, Shuang Luo, Yue Zhou, Yong Luo, Li Shen, Xiaosong Jia, Yao Mu, Xue Yang, Chunxiao Liu, Junchi Yan, Hengshuang Zhao, Dacheng Tao, Xiaogang Wang

PDF

Open Access 1 Models

TL;DR

ACE-Brain-0 introduces a unified multimodal large language model leveraging spatial intelligence as a universal foundation to improve generalization across diverse embodied AI systems like vehicles, robots, and UAVs.

Contribution

The paper proposes the SSR paradigm and a multimodal model that unifies spatial reasoning with domain-specific expertise for cross-embodiment generalization.

Findings

01

Achieves state-of-the-art performance on 24 benchmarks.

02

Effectively balances universal generalization with domain-specific skills.

03

Demonstrates the importance of spatial intelligence as a shared scaffold.

Abstract

Universal embodied intelligence demands robust generalization across heterogeneous embodiments, such as autonomous driving, robotics, and unmanned aerial vehicles (UAVs). However, existing embodied brain in training a unified model over diverse embodiments frequently triggers long-tail data, gradient interference, and catastrophic forgetting, making it notoriously difficult to balance universal generalization with domain-specific proficiency. In this report, we introduce ACE-Brain-0, a generalist foundation brain that unifies spatial reasoning, autonomous driving, and embodied manipulation within a single multimodal large language model~(MLLM). Our key insight is that spatial intelligence serves as a universal scaffold across diverse physical embodiments: although vehicles, robots, and UAVs differ drastically in morphology, they share a common need for modeling 3D mental space, making…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
ACE-Brain/ACE-Brain-0-8B
model· 456 dl· ♡ 8
456 dl♡ 8

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning