RoboTron-Mani: All-in-One Multimodal Large Model for Robotic Manipulation

Feng Yan; Fanfan Liu; Liming Zheng; Yufeng Zhong; Yiyang Huang; Zechao Guan; Chengjian Feng; Lin Ma

arXiv:2412.07215·cs.RO·November 5, 2025

RoboTron-Mani: All-in-One Multimodal Large Model for Robotic Manipulation

Feng Yan, Fanfan Liu, Liming Zheng, Yufeng Zhong, Yiyang Huang, Zechao Guan, Chengjian Feng, Lin Ma

PDF

Open Access 1 Repo

TL;DR

RoboTron-Mani is a comprehensive multimodal large model for robotic manipulation that leverages a new dataset RoboData, enhancing 3D perception, modality fusion, and achieving state-of-the-art results across diverse tasks.

Contribution

The paper introduces RoboTron-Mani, a novel multimodal model with improved 3D perception and modality fusion, and RoboData, a comprehensive dataset integrating multiple robotic data sources.

Findings

01

Outperforms expert models on manipulation tasks.

02

Increases average sequence length on CALVIN from 1.7 to 3.5.

03

Achieves state-of-the-art results on simulated and real-world datasets.

Abstract

Recently, robotics has advanced significantly through the integration of larger models and large-scale datasets. However, challenges remain in applying these models to 3D spatial interactions and managing data collection costs. To address these issues, we propose the multimodal robotic manipulation model RoboTron-Mani and the comprehensive dataset RoboData. RoboTron-Mani, on one hand, enhances 3D perception through camera parameters and occupancy supervision. On the other hand, it further incorporates Modality-Isolation-Mask and multimodal decoder blocks based on OpenFlamingo, improving modality fusion and fine-grained perception. RoboData integrats several publicly-available datasets, achieving the first fusion of multi-view images, camera parameters, depth maps, actions, and space alignment, which facilitates comprehensive learning from diverse robotic datasets and offers one complete…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RoboUniview/RoboMM
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning