Tool-RoCo: An Agent-as-Tool Self-organization Large Language Model Benchmark in Multi-robot Cooperation

Ke Zhang; Xiaoning Zhao; Ce Zheng; Jiahong Ning; Dandan Zhu; Wenqi Zhang; Chen Sun; Toshiharu Sugawara

arXiv:2511.21510·cs.MA·December 2, 2025

Tool-RoCo: An Agent-as-Tool Self-organization Large Language Model Benchmark in Multi-robot Cooperation

Ke Zhang, Xiaoning Zhao, Ce Zheng, Jiahong Ning, Dandan Zhu, Wenqi Zhang, Chen Sun, Toshiharu Sugawara

PDF

Open Access

TL;DR

Tool-RoCo introduces a benchmark for evaluating large language models in multi-robot cooperation, focusing on agent autonomy and self-organization through tool usage across various cooperation paradigms and tasks.

Contribution

It presents a novel benchmark that assesses LLM-based multi-agent cooperation and autonomy by treating other agents as tools and defining multiple cooperation paradigms.

Findings

01

LLMs rarely invoke others as tools (7.09%)

02

Most tools used are activation tools (96.42%)

03

Provides a systematic way to evaluate LLM autonomy in multi-agent tasks

Abstract

This study proposes Tool-RoCo, a novel benchmark for evaluating large language models (LLMs) in long-term multi-agent cooperation based on RoCo, a multi-robot cooperative benchmark. Recent research on LLM-based multi-agent systems has relied on predefined orchestration, while ignoring agent autonomy. Tool-RoCo treats other agents as tools and introduces cooperative tools, leveraging tool usage to evaluate multi-agent cooperation and self-organization. Tool usage means that each agent (LLM) selects a tool from a candidate set based on the current state, receives feedback, and adjusts its selection in subsequent rounds. To evaluate different autonomy levels, we propose four LLM paradigms: (1) centralized cooperation, where a single LLM allocates tools to all agents; (2) centralized self-organization, where a central LLM autonomously activates agents while keeping others inactive; (3)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Topic Modeling · Multimodal Machine Learning Applications