Towards a Science of Collective AI: LLM-based Multi-Agent Systems Need a Transition from Blind Trial-and-Error to Rigorous Science

Jingru Fan; Dewen Liu; Yufan Dang; Huatao Li; Yuheng Wang; Wei Liu; Feiyu Duan; Xuanwen Ding; Shu Yao; Lin Wu; Ruijie Shi; Wai-Shing Leung; Yuan Cheng; Zhongyu Wei; Cheng Yang; Chen Qian; Zhiyuan Liu; Maosong Sun

arXiv:2602.05289·cs.CL·February 6, 2026

Towards a Science of Collective AI: LLM-based Multi-Agent Systems Need a Transition from Blind Trial-and-Error to Rigorous Science

Jingru Fan, Dewen Liu, Yufan Dang, Huatao Li, Yuheng Wang, Wei Liu, Feiyu Duan, Xuanwen Ding, Shu Yao, Lin Wu, Ruijie Shi, Wai-Shing Leung, Yuan Cheng, Zhongyu Wei, Cheng Yang, Chen Qian, Zhiyuan Liu, Maosong Sun

PDF

Open Access

TL;DR

This paper advocates for a scientific framework for Multi-Agent Systems with Large Language Models, emphasizing structured metrics and factor attribution to replace trial-and-error approaches and enable systematic progress.

Contribution

It introduces a unified collaboration gain metric and a factor attribution paradigm, along with a systematic MAS factor library, to establish a rigorous scientific methodology.

Findings

01

Proposes the collaboration gain metric ($\Gamma$) as a standard for intrinsic collaboration evaluation.

02

Develops a factor attribution paradigm to identify key factors driving collaboration.

03

Constructs a systematic MAS factor library to structure the design space.

Abstract

Recent advancements in Large Language Models (LLMs) have greatly extended the capabilities of Multi-Agent Systems (MAS), demonstrating significant effectiveness across a wide range of complex and open-ended domains. However, despite this rapid progress, the field still relies heavily on empirical trial-and-error. It lacks a unified and principled scientific framework necessary for systematic optimization and improvement. This bottleneck stems from the ambiguity of attribution: first, the absence of a structured taxonomy of factors leaves researchers restricted to unguided adjustments; second, the lack of a unified metric fails to distinguish genuine collaboration gain from mere resource accumulation. In this paper, we advocate for a transition to design science through an integrated framework. We advocate to establish the collaboration gain metric ( $Γ$ ) as the scientific standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Materials Science · Language and cultural evolution