CollaBot: Vision-Language Guided Simultaneous Collaborative Manipulation

Kun Song; Shentao Ma; Gaoming Chen; Ninglong Jin; Guangbao Zhao; Mingyu Ding; Zhenhua Xiong; Jia Pan

arXiv:2508.03526·cs.RO·August 6, 2025

CollaBot: Vision-Language Guided Simultaneous Collaborative Manipulation

Kun Song, Shentao Ma, Gaoming Chen, Ninglong Jin, Guangbao Zhao, Mingyu Ding, Zhenhua Xiong, Jia Pan

PDF

TL;DR

CollaBot is a versatile framework enabling multiple robots to collaboratively manipulate large objects by integrating scene segmentation, grasp planning, and collision-free trajectory generation, demonstrating effectiveness across various scenarios.

Contribution

This work introduces CollaBot, a scalable and generalist framework for multi-robot collaborative manipulation that generalizes to different robot sizes and task types.

Findings

01

52% success rate across various scenarios

02

Effective scene segmentation and grasp planning

03

Collision-free trajectory generation demonstrated

Abstract

A central research topic in robotics is how to use this system to interact with the physical world. Traditional manipulation tasks primarily focus on small objects. However, in factory or home environments, there is often a need for the movement of large objects, such as moving tables. These tasks typically require multi-robot systems to work collaboratively. Previous research lacks a framework that can scale to arbitrary sizes of robots and generalize to various kinds of tasks. In this work, we propose CollaBot, a generalist framework for simultaneous collaborative manipulation. First, we use SEEM for scene segmentation and point cloud extraction of the target object. Then, we propose a collaborative grasping framework, which decomposes the task into local grasp pose generation and global collaboration. Finally, we design a 2-stage planning module that can generate collision-free…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.