DiscoUQ: Structured Disagreement Analysis for Uncertainty Quantification in LLM Agent Ensembles
Bo Jiang

TL;DR
DiscoUQ introduces a novel framework that analyzes inter-agent disagreement in multi-agent LLM systems to produce better uncertainty estimates, leveraging linguistic and embedding structures for improved calibration and robustness.
Contribution
The paper presents DiscoUQ, a structured disagreement analysis framework that enhances uncertainty quantification in LLM ensembles by utilizing semantic and geometric disagreement features.
Findings
DiscoUQ-LLM achieves an average AUROC of 0.802, outperforming baselines.
DiscoUQ provides well-calibrated confidence estimates with lower ECE.
Features generalize across diverse benchmarks with minimal performance loss.
Abstract
Multi-agent LLM systems, where multiple prompted instances of a language model independently answer questions, are increasingly used for complex reasoning tasks. However, existing methods for quantifying the uncertainty of their collective outputs rely on shallow voting statistics that discard the rich semantic information in agents' reasoning. We introduce DiscoUQ, a framework that extracts and leverages the structure of inter-agent disagreement -- both linguistic properties (evidence overlap, argument strength, divergence depth) and embedding geometry (cluster distances, dispersion, cohesion) -- to produce well-calibrated confidence estimates. We propose three methods of increasing complexity: DiscoUQ-LLM (logistic regression on LLM-extracted structure features), DiscoUQ-Embed (logistic regression on embedding geometry), and DiscoUQ-Learn (a neural network combining all features).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Explainable Artificial Intelligence (XAI)
