TL;DR
PROClaim introduces a structured courtroom-style multi-agent debate framework with Progressive RAG and role-switching, significantly improving the reliability of claim verification in high-stakes scenarios.
Contribution
It presents a novel structured debate framework with dynamic evidence refinement and model heterogeneity, outperforming existing methods on the Check-COVID benchmark.
Findings
Achieves 81.7% accuracy on Check-COVID, 10% higher than standard debate.
P-RAG contributes +7.5 percentage points to performance.
Structural deliberation reduces systematic biases.
Abstract
Large language models (LLMs) remain unreliable for high-stakes claim verification due to hallucinations and shallow reasoning. While retrieval-augmented generation (RAG) and multi-agent debate (MAD) address this, they are limited by one-pass retrieval and unstructured debate dynamics. We propose a courtroom-style multi-agent framework, PROClaim, that reformulates verification as a structured, adversarial deliberation. Our approach integrates specialized roles (e.g., Plaintiff, Defense, Judge) with Progressive RAG (P-RAG) to dynamically expand and refine the evidence pool during the debate. Furthermore, we employ evidence negotiation, self-reflection, and heterogeneous multi-judge aggregation to enforce calibration, robustness, and diversity. In zero-shot evaluations on the Check-COVID benchmark, PROClaim achieves 81.7% accuracy, outperforming standard multi-agent debate by 10.0…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
