Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling

Florian Valentin Wunderlich; Lars Benedikt Kaesberg; Jan Philip Wahle; Terry Ruas; Bela Gipp

arXiv:2605.01566·cs.AI·May 5, 2026

Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling

Florian Valentin Wunderlich, Lars Benedikt Kaesberg, Jan Philip Wahle, Terry Ruas, Bela Gipp

PDF

TL;DR

This paper systematically analyzes multi-agent inference strategies to enhance computational efficiency and accuracy in language models, identifying Pareto-optimal configurations across various benchmarks and model sizes.

Contribution

It introduces a comprehensive evaluation of inference scaling methods, revealing how multi-agent debate and mixture-of-agents outperform traditional approaches in resource-constrained settings.

Findings

01

Inference scaling improves accuracy by up to +7.1% on MMLU-Pro.

02

Debate and mixture-of-agents outperform self-consistency at equal compute budgets.

03

Mixture-of-agents is most efficient when parallel generations exceed sequential aggregations.

Abstract

Advances in inference methods have enabled language models to improve their predictions without additional training. These methods often prioritize raw performance over cost-effective compute usage. However, computational efficiency is key for real-world applications with resource constraints. We provide a systematic analysis of the inference scaling strategies self-consistency, self-refinement, multi-agent debate, and mixture-of-agents, to study their computational performance tradeoffs. We evaluate methods on two reasoning benchmarks (MMLU-Pro, BBH) and include extensive parameter configurations (e.g., scaling the number of parallel predictions, agents, and debate rounds) across different model sizes. Across 34 configurations and over 100 evaluations, we compute the Pareto-optimal front to select methods that achieve the best accuracy with the lowest computational budget. Notably,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.