Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Rival   Human Crowd Accuracy

Philipp Schoenegger; Indre Tuminauskaite; Peter S. Park; Philip E.; Tetlock

arXiv:2402.19379·cs.CY·July 23, 2024·2 cites

Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Rival Human Crowd Accuracy

Philipp Schoenegger, Indre Tuminauskaite, Peter S. Park, Philip E., Tetlock

PDF

Open Access

TL;DR

This study demonstrates that ensembles of large language models can match human crowd forecasting accuracy through simple aggregation, highlighting their potential for societal applications.

Contribution

The paper shows that aggregating predictions from multiple LLMs can rival human crowd forecasts, extending the 'wisdom of the crowd' to artificial models.

Findings

01

LLM ensemble outperforms no-information benchmark

02

LLM ensemble is statistically comparable to human crowd

03

Forecast accuracy improves when LLMs incorporate human predictions

Abstract

Human forecasting accuracy in practice relies on the 'wisdom of the crowd' effect, in which predictions about future events are significantly improved by aggregating across a crowd of individual forecasters. Past work on the forecasting ability of large language models (LLMs) suggests that frontier LLMs, as individual forecasters, underperform compared to the gold standard of a human crowd forecasting tournament aggregate. In Study 1, we expand this research by using an LLM ensemble approach consisting of a crowd of twelve LLMs. We compare the aggregated LLM predictions on 31 binary questions to that of a crowd of 925 human forecasters from a three-month forecasting tournament. Our preregistered main analysis shows that the LLM crowd outperforms a simple no-information benchmark and is not statistically different from the human crowd. In exploratory analyses, we find that these two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Anomaly Detection Techniques and Applications · Time Series Analysis and Forecasting

MethodsAttention Is All You Need · Linear Layer · Dropout · Layer Normalization · Byte Pair Encoding · Multi-Head Attention · Softmax · Dense Connections · Label Smoothing · Adam