Can Language Models Use Forecasting Strategies?
Sarah Pratt, Seth Blumberg, Pietro Kreitlon Carolino, Meredith Ringel, Morris

TL;DR
This paper evaluates large language models' ability to forecast real-world events, revealing their current limitations and highlighting the need for improved methods to assess and enhance their predictive capabilities.
Contribution
Introduces a novel dataset and evaluation framework for assessing LLM forecasting skills, and analyzes the models' performance against human predictions.
Findings
Models struggle to make accurate future predictions.
LLMs tend to underestimate event likelihoods.
Performance gap between models and humans identified.
Abstract
Advances in deep learning systems have allowed large models to match or surpass human accuracy on a number of skills such as image classification, basic programming, and standardized test taking. As the performance of the most capable models begin to saturate on tasks where humans already achieve high accuracy, it becomes necessary to benchmark models on increasingly complex abilities. One such task is forecasting the future outcome of events. In this work we describe experiments using a novel dataset of real world events and associated human predictions, an evaluation metric to measure forecasting ability, and the accuracy of a number of different LLM based forecasting designs on the provided dataset. Additionally, we analyze the performance of the LLM forecasters against human predictions and find that models still struggle to make accurate predictions about the future. Our follow-up…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
