Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets
Dat Tran, Douwe Kiela

TL;DR
This study shows that single-agent large language models often outperform multi-agent systems in multi-hop reasoning when considering equal reasoning token budgets, challenging the perceived advantages of multi-agent architectures.
Contribution
The paper provides an information-theoretic framework and empirical evidence demonstrating that single-agent systems are more information-efficient under fixed token budgets, questioning prior multi-agent system claims.
Findings
Single-agent models match or outperform multi-agent systems on reasoning tasks with equal token budgets.
API-based budget controls and standard benchmarks can inflate multi-agent system performance.
Multi-agent advantages are often due to unaccounted computation and context effects, not architecture.
Abstract
Recent work reports strong performance from multi-agent LLM systems (MAS), but these gains are often confounded by increased test-time computation. When computation is normalized, single-agent systems (SAS) can match or outperform MAS, yet the theoretical basis and evaluation methodology behind this comparison remain unclear. We present an information-theoretic argument, grounded in the Data Processing Inequality, suggesting that under a fixed reasoning-token budget and with perfect context utilization, single-agent systems are more information-efficient. This perspective further predicts that multi-agent systems become competitive when a single agent's effective context utilization is degraded, or when more compute is expended. We test these predictions in a controlled empirical study across three model families (Qwen3, DeepSeek-R1-Distill-Llama, and Gemini 2.5), comparing SAS with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
