Adaptive Consensus in LLM Ensembles via Sequential Evidence Accumulation: Automatic Budget Identification and Calibrated Commit Signals
Roberto E. Medina

TL;DR
This paper introduces DASE, a heuristic for adaptive ensemble deliberation in large language models, improving accuracy by early consensus detection and fallback strategies, with broad applicability across benchmarks.
Contribution
DASE is a novel adaptive stopping heuristic that generalizes across benchmarks, enabling early consensus commitment and improving ensemble reasoning accuracy.
Findings
DASE achieves significant routing gaps and accuracy improvements across benchmarks.
Adaptive stopping, not bandwidth, primarily drives ensemble accuracy.
Injection-based methods show an inverted-U accuracy trajectory, suggesting new hypotheses.
Abstract
Large Language Model ensembles improve reasoning accuracy, but only up to a performance boundary beyond which additional deliberation degrades accuracy. We introduce DASE (Deliberative Adaptive Stopping Ensemble), a stopping heuristic for iterative ensemble deliberation that commits early on genuine consensus and applies a global-frequency fallback on fragmented evidence. We make three contributions. (1) DASE produces a commit-type routing partition that generalises across benchmarks and is complementary to verbalized single-call confidence. On GPQA-Extended (N=546, 70B ensemble), the partition yields a 39.5 pp routing gap (right-wall 81.1% vs. left-wall 41.5%). On AIME 2010-2023 (N=261, 120B ensemble, 3 seeds), right-wall commits reach 98.3% accuracy vs. left-wall 72.8% (25.5 pp gap), statistically equivalent to Opus 4.6 Standard verbalized confidence at matched coverage (25.7 pp gap;…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
