Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness
Yongjin Yang, Euiin Yi, Jongwoo Ko, Kimin Lee, Zhijing Jin, Se-Young Yun

TL;DR
This paper systematically evaluates multi-agent debate as a test-time scaling method, revealing its strengths and limitations across different tasks, model sizes, and agent configurations, guiding future MAD system development.
Contribution
It provides a comprehensive empirical analysis of MAD's effectiveness compared to self-agent methods, highlighting conditions where MAD excels or falls short.
Findings
MAD offers limited benefits for mathematical reasoning but improves with task difficulty.
Agent diversity shows little impact on mathematical reasoning performance.
In safety tasks, MAD can increase vulnerability but helps reduce attack success with diverse configurations.
Abstract
The remarkable growth in large language model (LLM) capabilities has spurred exploration into multi-agent systems, with debate frameworks emerging as a promising avenue for enhanced problem-solving. These multi-agent debate (MAD) approaches, where agents collaboratively present, critique, and refine arguments, potentially offer improved reasoning, robustness, and diverse perspectives over monolithic models. Despite prior studies leveraging MAD, a systematic understanding of its effectiveness compared to self-agent methods, particularly under varying conditions, remains elusive. This paper seeks to fill this gap by conceptualizing MAD as a test-time computational scaling technique, distinguished by collaborative refinement and diverse exploration capabilities. We conduct a comprehensive empirical investigation comparing MAD with strong self-agent test-time scaling baselines on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Reinforcement Learning in Robotics · Topic Modeling
