ABot-N0: Technical Report on the VLA Foundation Model for Versatile Embodied Navigation

Zedong Chu; Shichao Xie; Xiaolong Wu; Yanfen Shen; Minghua Luo; Zhengbo Wang; Fei Liu; Xiaoxu Leng; Junjun Hu; Mingyang Yin; Jia Lu; Yingnan Guo; Kai Yang; Jiawei Han; Xu Chen; Yanqing Zhu; Yuxiang Zhao; Xin Liu; Yirong Yang; Ye He; Jiahang Wang; Yang Cai; Tianlin Zhang; Li Gao; Liu Liu; Mingchao Sun; Fan Jiang; Chiyu Wang; Zhicheng Liu; Hongyu Pan; Honglin Han; Zhining Gu; Kuan Yang; Jianfang Zhang; Di Jing; Zihao Guan; Wei Guo; Guoqing Liu; Di Yang; Xiangpo Yang; Menglin Yang; Hongguang Xing; Weiguo Li; Mu Xu

arXiv:2602.11598·cs.RO·February 13, 2026

ABot-N0: Technical Report on the VLA Foundation Model for Versatile Embodied Navigation

Zedong Chu, Shichao Xie, Xiaolong Wu, Yanfen Shen, Minghua Luo, Zhengbo Wang, Fei Liu, Xiaoxu Leng, Junjun Hu, Mingyang Yin, Jia Lu, Yingnan Guo, Kai Yang, Jiawei Han, Xu Chen, Yanqing Zhu, Yuxiang Zhao, Xin Liu, Yirong Yang, Ye He, Jiahang Wang, Yang Cai, Tianlin Zhang, Li Gao

PDF

Open Access

TL;DR

ABot-N0 is a unified foundation model for embodied navigation that integrates vision, language, and action to perform multiple tasks with state-of-the-art performance, supported by extensive data and hierarchical reasoning architecture.

Contribution

Introduces ABot-N0, a novel VLA foundation model with a hierarchical architecture and large-scale data engine, unifying multiple navigation tasks and achieving SOTA results.

Findings

01

Achieves new SOTA across 7 benchmarks.

02

Outperforms specialized models significantly.

03

Enables robust long-horizon navigation in real-world environments.

Abstract

Embodied navigation has long been fragmented by task-specific architectures. We introduce ABot-N0, a unified Vision-Language-Action (VLA) foundation model that achieves a ``Grand Unification'' across 5 core tasks: Point-Goal, Object-Goal, Instruction-Following, POI-Goal, and Person-Following. ABot-N0 utilizes a hierarchical ``Brain-Action'' architecture, pairing an LLM-based Cognitive Brain for semantic reasoning with a Flow Matching-based Action Expert for precise, continuous trajectory generation. To support large-scale learning, we developed the ABot-N0 Data Engine, curating 16.9M expert trajectories and 5.0M reasoning samples across 7,802 high-fidelity 3D scenes (10.7 $km^{2}$ ). ABot-N0 achieves new SOTA performance across 7 benchmarks, significantly outperforming specialized models. Furthermore, our Agentic Navigation System integrates a planner with hierarchical topological…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Reinforcement Learning in Robotics