Sequential Consensus for Multi-Agent LLM Debates: A Wald-SPRT compute governor with calibration-based failure detection

Andrea Morandi

arXiv:2605.19193·cs.LG·May 20, 2026

Sequential Consensus for Multi-Agent LLM Debates: A Wald-SPRT compute governor with calibration-based failure detection

Andrea Morandi

PDF

TL;DR

This paper introduces a Wald-SPRT based compute governor for multi-agent LLM debates, enabling adaptive stopping that reduces computational cost while maintaining high accuracy.

Contribution

It adapts Wald's SPRT as a plug-in monitor for LLM debates, providing error guarantees and calibration methods for efficient, adaptive decision-making.

Findings

01

Average debate rounds reduced by 3.7x on GSM8K

02

Achieved 97.0% accuracy with fewer calls compared to fixed rounds

03

Calibrated KL divergence collapses, indicating effective stopping rules

Abstract

Multi-agent LLM debate improves factuality and reasoning, but most recipes pick a fixed round count, over-spending on easy items and under-spending on hard ones. We adapt Wald's Sequential Probability Ratio Test (SPRT) as a plug-in compute governor for LLM debates. After each round, an LLM judge emits a [0,1] consensus score on the latest agent positions; a Wald monitor accumulates the log-likelihood ratio of "useful convergence" vs "not yet useful" under a Beta likelihood family, and stops when either boundary is crossed or returns a capped best-effort outcome at R_max. Under i.i.d. assumptions the rule inherits SPRT type-I/type-II error guarantees; in deployment the calibration itself is the more important object, since it estimates whether the judge score actually separates useful from unhelpful convergence in a given domain. We evaluate two tracks: (i) a Monte-Carlo study under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.