Towards Scalable Oversight with Collaborative Multi-Agent Debate in Error Detection

Yongqiang Chen; Gang Niu; James Cheng; Bo Han; Masashi Sugiyama

arXiv:2510.20963·cs.LG·October 27, 2025

Towards Scalable Oversight with Collaborative Multi-Agent Debate in Error Detection

Yongqiang Chen, Gang Niu, James Cheng, Bo Han, Masashi Sugiyama

PDF

3 Reviews

TL;DR

This paper introduces ColMAD, a collaborative multi-agent debate protocol that improves error detection in large language models by encouraging agents to support each other, leading to more accurate oversight.

Contribution

The paper proposes a novel non-zero sum collaborative debate protocol, ColMAD, which enhances error detection by fostering supportive criticism among agents, reducing debate hacking.

Findings

01

ColMAD outperforms previous MAD by 19% in error detection.

02

ColMAD shows significant improvements over single-agent methods.

03

Collaborative debate reduces misleading tactics in error detection.

Abstract

Accurate detection of errors in large language models (LLM) responses is central to the success of scalable oversight, or providing effective supervision to superhuman intelligence. Yet, self-diagnosis is often unreliable on complex tasks unless aided by reliable external feedback. Multi-agent debate (MAD) seems to be a natural alternative to external feedback: multiple LLMs provide complementary perspectives and cross-checks for error detection. However, prior MAD protocols frame debate as a zero-sum game, where the debaters compete to win the game instead of seeking the truth. Consequently, it leads to debate hacking: debaters tend to mislead the judge by misinterpreting the task or presenting overconfident claims, which introduce more mistakes and underperform single-agent methods. To mitigate the issue, we introduce a new collaborative MAD protocol, termed ColMAD, that reframes MAD…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 5

Strengths

Exploring new debate protocols empirically and theoretically is an important topic. The paper attempts to formalize situations in which collaborative debate outperforms competitive debate.

Weaknesses

1. The theoretical results in this paper do not really prove anything, and are difficult to parse as the assumptions are not clearly stated. First, Proposition 1 shows that competitive debate does not improve over no debate at all. The assumption required for this is not stated in the statement of the proposition, but if we read the proof in the appendix, we find that the assumption required is: the competing debaters' equilibrium strategy provides no information to the judge. Proposition 2 sh

Reviewer 02Rating 8Confidence 2

Strengths

1. **Clear motivation and observation.** Competitive MAD protocols often result in performance degradation due to their zero-sum nature. - Debaters in competitive MAD may misinterpret tasks and present overconfident claims, leading to misleading outcomes. - Debate hacking behaviors, such as fake evidence and fallacious arguments, are prevalent in competitive settings. 2. **ColMAD.** This paper proposes a new MAD protocol called Collaborative Multi-Agenet Debate (ColMAD) that reframes MAD as a

Weaknesses

1. From Tables 1 and 2, ColMAD shows substantially better performance than CopMAD but only slightly outperforms the Ensemble baseline. The paper would benefit from a deeper analysis of this comparison. For example, discussing why Ensemble achieves similar results and what unique advantages ColMAD provides beyond simple model aggregation.

Reviewer 03Rating 2Confidence 3

Strengths

* This paper applies MAD in error detection, which extends the application boundary of MAD systems * The evaluation is comprehensive covering a range of LLMs and benchmarks, demonstrating the superior improvement * The paper is well written and easy to follow

Weaknesses

* The argument "as previous approaches often frame MAD as a zero-sum game where the debaters compete with each other" is not convincing. I believe most MAD systems are not framed as zeros-sum games. There lacks references or empirical evidences to support this argument. While a part of MAD systems encourage agents to debate against each other, they cannot be considered strictly as zero-sum game as well. * The major contribution, "ColMAD asks debaters to collaborate and complement each other’s mi

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.