Superficial Success vs. Internal Breakdown: An Empirical Study of Generalization in Adaptive Multi-Agent Systems
Namyoung So, Seokgyu Jang, and Taeuk Kim

TL;DR
This empirical study investigates the generalization capabilities of adaptive multi-agent systems, revealing issues like topological overfitting and illusory coordination that challenge their practical deployment.
Contribution
The paper provides the first comprehensive empirical analysis of generalization in adaptive MAS, highlighting key failure modes and advocating for broader evaluation protocols.
Findings
Adaptive MAS fail to generalize across different domains.
They exhibit surface-level accuracy despite diverging from ideal behavior.
The study emphasizes the importance of generalization-focused evaluation.
Abstract
Adaptive multi-agent systems (MAS) are increasingly adopted to tackle complex problems. However, the narrow task coverage of their optimization raises the question of whether they can function as general-purpose systems. To address this gap, we conduct an extensive empirical study of adaptive MAS, revealing two key findings: (1) topological overfitting -- they fail to generalize across different domains; and (2) illusory coordination -- they achieve reasonable surface-level accuracy while the underlying agent interactions diverge from ideal MAS behavior, raising concerns about their practical utility. These findings highlight the pressing need to prioritize generalization in MAS development and motivate evaluation protocols that extend beyond simple final-answer correctness.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
