ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning

Yandan Yang; Shuang Zeng; Tong Lin; Xinyuan Chang; Dekang Qi; Junjin Xiao; Haoyun Liu; Ronghan Chen; Yuzhi Chen; Dongjie Huo; Feng Xiong; Xing Wei; Zhiheng Ma; Mu Xu

arXiv:2602.11236·cs.CV·April 15, 2026

ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning

Yandan Yang, Shuang Zeng, Tong Lin, Xinyuan Chang, Dekang Qi, Junjin Xiao, Haoyun Liu, Ronghan Chen, Yuzhi Chen, Dongjie Huo, Feng Xiong, Xing Wei, Zhiheng Ma, Mu Xu

PDF

1 Repo 4 Models

TL;DR

ABot-M0 introduces a unified framework for robotic manipulation that leverages large-scale data, a low-dimensional action manifold, and modular perception to enhance generalization, efficiency, and stability in embodied agents.

Contribution

The paper presents a comprehensive data pipeline, the Action Manifold Hypothesis, and modular perception mechanisms, advancing general-purpose robotic manipulation with large-scale datasets and novel learning strategies.

Findings

01

Constructed UniACT-dataset with over 6 million trajectories from six datasets.

02

Unified pre-training improves transferability across robots and tasks.

03

Action Manifold Learning enhances action prediction speed and stability.

Abstract

Building general-purpose embodied agents across diverse hardware remains a central challenge in robotics, often framed as the ''one-brain, many-forms'' paradigm. Progress is hindered by fragmented data, inconsistent representations, and misaligned training objectives. We present ABot-M0, a framework that builds a systematic data curation pipeline while jointly optimizing model architecture and training strategies, enabling end-to-end transformation of heterogeneous raw data into unified, efficient representations. From six public datasets, we clean, standardize, and balance samples to construct UniACT-dataset, a large-scale dataset with over 6 million trajectories and 9,500 hours of data, covering diverse robot morphologies and task scenarios. Unified pre-training improves knowledge transfer and generalization across platforms and tasks, supporting general-purpose embodied intelligence.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amap-cvlab/ABot-Manipulation
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.