Approaching Human-Level Forecasting with Language Models
Danny Halawi, Fred Zhang, Chen Yueh-Han, Jacob Steinhardt

TL;DR
This paper explores whether language models can match or surpass human forecasters in predicting future events by developing a retrieval-augmented system and evaluating it on a large dataset, showing promising results.
Contribution
The study introduces a retrieval-augmented language model system for forecasting and demonstrates its competitive performance against human forecasters on a new dataset.
Findings
System nears crowd aggregate of forecasters
In some cases, surpasses human forecasts
Shows potential for scalable, accurate future predictions
Abstract
Forecasting future events is important for policy and decision making. In this work, we study whether language models (LMs) can forecast at the level of competitive human forecasters. Towards this goal, we develop a retrieval-augmented LM system designed to automatically search for relevant information, generate forecasts, and aggregate predictions. To facilitate our study, we collect a large dataset of questions from competitive forecasting platforms. Under a test set published after the knowledge cut-offs of our LMs, we evaluate the end-to-end performance of our system against the aggregates of human forecasts. On average, the system nears the crowd aggregate of competitive forecasters, and in some settings surpasses it. Our work suggests that using LMs to forecast the future could provide accurate predictions at scale and help to inform institutional decision making.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
Topicsdemographic modeling and climate adaptation · Data Quality and Management · Speech and dialogue systems
MethodsSparse Evolutionary Training
