Consensus is Not Verification: Why Crowd Wisdom Strategies Fail for LLM Truthfulness

Yegor Denisov-Blanch; Joshua Kazdan; Jessica Chudnovsky; Rylan Schaeffer; Sheng Guan; Soji Adeshina; Sanmi Koyejo

arXiv:2603.06612·cs.LG·March 10, 2026

Consensus is Not Verification: Why Crowd Wisdom Strategies Fail for LLM Truthfulness

Yegor Denisov-Blanch, Joshua Kazdan, Jessica Chudnovsky, Rylan Schaeffer, Sheng Guan, Soji Adeshina, Sanmi Koyejo

PDF

Open Access

TL;DR

Scaling inference with crowd wisdom strategies does not improve truthfulness in unverified domains, as models tend to share misconceptions and errors are highly correlated, limiting the effectiveness of aggregation methods.

Contribution

This paper demonstrates that increasing inference compute does not enhance truthfulness in unverified domains due to correlated errors and shared misconceptions among models.

Findings

01

Polling-style aggregation yields no accuracy gains over single samples.

02

Models better predict other models' outputs than true facts.

03

Error correlation persists even with random or out-of-distribution inputs.

Abstract

Pass@k and other methods of scaling inference compute can improve language model performance in domains with external verifiers, including mathematics and code, where incorrect candidates can be filtered reliably. This raises a natural question: can we similarly scale compute to elicit gains in truthfulness for domains without convenient verification? We show that across five benchmarks and models, surprisingly, it cannot. Even at 25x the inference cost of naive sampling, polling-style aggregation yields no consistent accuracy gains over single-sample baselines and often amplifies shared misconceptions. We find that under uncertainty, models are better at predicting what other models will say within model ensembles than at identifying what is true, revealing a separation between social prediction and truth verification. Across models and benchmarks, aggregation fails to provide a robust…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Computational and Text Analysis Methods