Consensus is Not Verification: Why Crowd Wisdom Strategies Fail for LLM Truthfulness
Yegor Denisov-Blanch, Joshua Kazdan, Jessica Chudnovsky, Rylan Schaeffer, Sheng Guan, Soji Adeshina, Sanmi Koyejo

TL;DR
Scaling inference with crowd wisdom strategies does not improve truthfulness in unverified domains, as models tend to share misconceptions and errors are highly correlated, limiting the effectiveness of aggregation methods.
Contribution
This paper demonstrates that increasing inference compute does not enhance truthfulness in unverified domains due to correlated errors and shared misconceptions among models.
Findings
Polling-style aggregation yields no accuracy gains over single samples.
Models better predict other models' outputs than true facts.
Error correlation persists even with random or out-of-distribution inputs.
Abstract
Pass@k and other methods of scaling inference compute can improve language model performance in domains with external verifiers, including mathematics and code, where incorrect candidates can be filtered reliably. This raises a natural question: can we similarly scale compute to elicit gains in truthfulness for domains without convenient verification? We show that across five benchmarks and models, surprisingly, it cannot. Even at 25x the inference cost of naive sampling, polling-style aggregation yields no consistent accuracy gains over single-sample baselines and often amplifies shared misconceptions. We find that under uncertainty, models are better at predicting what other models will say within model ensembles than at identifying what is true, revealing a separation between social prediction and truth verification. Across models and benchmarks, aggregation fails to provide a robust…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Computational and Text Analysis Methods
