Demystifying Multi-Agent Debate: The Role of Confidence and Diversity

Xiaochen Zhu; Caiqi Zhang; Yizhou Chi; Tom Stafford; Nigel Collier; Andreas Vlachos

arXiv:2601.19921·cs.CL·January 29, 2026

Demystifying Multi-Agent Debate: The Role of Confidence and Diversity

Xiaochen Zhu, Caiqi Zhang, Yizhou Chi, Tom Stafford, Nigel Collier, Andreas Vlachos

PDF

Open Access

TL;DR

This paper enhances multi-agent debate for large language models by introducing diversity and confidence mechanisms, leading to improved accuracy over traditional methods across multiple benchmarks.

Contribution

It identifies key missing elements in vanilla MAD—diversity and confidence—and proposes simple interventions that significantly improve debate outcomes.

Findings

01

Diversity-aware initialisation increases the chance of correct hypotheses.

02

Confidence-modulated updates guide debates towards correct answers.

03

Methods outperform vanilla MAD and majority vote on six benchmarks.

Abstract

Multi-agent debate (MAD) is widely used to improve large language model (LLM) performance through test-time scaling, yet recent work shows that vanilla MAD often underperforms simple majority vote despite higher computational cost. Studies show that, under homogeneous agents and uniform belief updates, debate preserves expected correctness and therefore cannot reliably improve outcomes. Drawing on findings from human deliberation and collective decision-making, we identify two key mechanisms missing from vanilla MAD: (i) diversity of initial viewpoints and (ii) explicit, calibrated confidence communication. We propose two lightweight interventions. First, a diversity-aware initialisation that selects a more diverse pool of candidate answers, increasing the likelihood that a correct hypothesis is present at the start of debate. Second, a confidence-modulated debate protocol in which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Ethics and Social Impacts of AI · Opinion Dynamics and Social Influence