When Two LLMs Debate, Both Think They'll Win
Pradyumna Shyama Prasad, Minh Nhat Nguyen

TL;DR
This study evaluates large language models in a multi-turn debate setting, revealing systematic overconfidence, mutual overestimation, and misalignment between reasoning and confidence, raising concerns about their self-assessment abilities.
Contribution
It introduces a novel dynamic debate framework to assess LLMs' confidence calibration and uncovers significant overconfidence and reasoning-confidence misalignments.
Findings
Models exhibit systematic overconfidence with initial confidence averaging 72.9%.
Confidence tends to escalate rather than decrease during debates.
Mutual overestimation occurs in over 60% of debates.
Abstract
Can LLMs accurately adjust their confidence when facing opposition? Building on previous studies measuring calibration on static fact-based question-answering tasks, we evaluate Large Language Models (LLMs) in a dynamic, adversarial debate setting, uniquely combining two realistic factors: (a) a multi-turn format requiring models to update beliefs as new information emerges, and (b) a zero-sum structure to control for task-related uncertainty, since mutual high-confidence claims imply systematic overconfidence. We organized 60 three-round policy debates among ten state-of-the-art LLMs, with models privately rating their confidence (0-100) in winning after each round. We observed five concerning patterns: (1) Systematic overconfidence: models began debates with average initial confidence of 72.9% vs. a rational 50% baseline. (2) Confidence escalation: rather than reducing confidence as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCorporate Governance and Law
