TL;DR
This paper introduces ColMAD, a collaborative multi-agent debate protocol that improves error detection in large language models by encouraging agents to support each other, leading to more accurate oversight.
Contribution
The paper proposes a novel non-zero sum collaborative debate protocol, ColMAD, which enhances error detection by fostering supportive criticism among agents, reducing debate hacking.
Findings
ColMAD outperforms previous MAD by 19% in error detection.
ColMAD shows significant improvements over single-agent methods.
Collaborative debate reduces misleading tactics in error detection.
Abstract
Accurate detection of errors in large language models (LLM) responses is central to the success of scalable oversight, or providing effective supervision to superhuman intelligence. Yet, self-diagnosis is often unreliable on complex tasks unless aided by reliable external feedback. Multi-agent debate (MAD) seems to be a natural alternative to external feedback: multiple LLMs provide complementary perspectives and cross-checks for error detection. However, prior MAD protocols frame debate as a zero-sum game, where the debaters compete to win the game instead of seeking the truth. Consequently, it leads to debate hacking: debaters tend to mislead the judge by misinterpreting the task or presenting overconfident claims, which introduce more mistakes and underperform single-agent methods. To mitigate the issue, we introduce a new collaborative MAD protocol, termed ColMAD, that reframes MAD…
Peer Reviews
Decision·Submitted to ICLR 2026
Exploring new debate protocols empirically and theoretically is an important topic. The paper attempts to formalize situations in which collaborative debate outperforms competitive debate.
1. The theoretical results in this paper do not really prove anything, and are difficult to parse as the assumptions are not clearly stated. First, Proposition 1 shows that competitive debate does not improve over no debate at all. The assumption required for this is not stated in the statement of the proposition, but if we read the proof in the appendix, we find that the assumption required is: the competing debaters' equilibrium strategy provides no information to the judge. Proposition 2 sh
1. **Clear motivation and observation.** Competitive MAD protocols often result in performance degradation due to their zero-sum nature. - Debaters in competitive MAD may misinterpret tasks and present overconfident claims, leading to misleading outcomes. - Debate hacking behaviors, such as fake evidence and fallacious arguments, are prevalent in competitive settings. 2. **ColMAD.** This paper proposes a new MAD protocol called Collaborative Multi-Agenet Debate (ColMAD) that reframes MAD as a
1. From Tables 1 and 2, ColMAD shows substantially better performance than CopMAD but only slightly outperforms the Ensemble baseline. The paper would benefit from a deeper analysis of this comparison. For example, discussing why Ensemble achieves similar results and what unique advantages ColMAD provides beyond simple model aggregation.
* This paper applies MAD in error detection, which extends the application boundary of MAD systems * The evaluation is comprehensive covering a range of LLMs and benchmarks, demonstrating the superior improvement * The paper is well written and easy to follow
* The argument "as previous approaches often frame MAD as a zero-sum game where the debaters compete with each other" is not convincing. I believe most MAD systems are not framed as zeros-sum games. There lacks references or empirical evidences to support this argument. While a part of MAD systems encourage agents to debate against each other, they cannot be considered strictly as zero-sum game as well. * The major contribution, "ColMAD asks debaters to collaborate and complement each other’s mi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
