Bench to the Future: A Pastcasting Benchmark for Forecasting Agents
FutureSearch: Jack Wildman, Nikos I. Bosse, Daniel Hnyk, Peter M\"uhlbacher, Finn Hambly, Jon Evans, Dan Schwarz, Lawrence Phillips

TL;DR
Bench To the Future (BTF) is a novel 'pastcasting' benchmark that uses known past events and extensive web data to evaluate and track the forecasting capabilities of large language models over time.
Contribution
This paper introduces BTF, a realistic, repeatable benchmark for forecasting with known outcomes, enabling consistent evaluation of LLMs' forecasting abilities.
Findings
BTF produces results comparable to real-time internet-based forecasts.
Benchmarking shows steady progress in LLM forecasting capabilities.
BTF can track improvements over different model versions and approaches.
Abstract
Forecasting is a challenging task that offers a clearly measurable way to study AI systems. Forecasting requires a large amount of research on the internet, and evaluations require time for events to happen, making the development of forecasting benchmarks challenging. To date, no forecasting benchmark provides a realistic, hermetic, and repeatable environment for LLM forecasters. We introduce Bench To the Future (BTF), a "pastcasting" benchmark with hundreds of high-quality questions for which the resolution is already known. Each question is accompanied by a large offline corpus of tens of thousands of relevant web pages, enabling a way to elicit realistic "forecasts" on past events from LLMs. Results suggest that our pastcasting environment can produce results comparable to those based on forecasts using the internet on at-the-time unresolved questions. We show results benchmarking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsForecasting Techniques and Applications · Mobile Crowdsensing and Crowdsourcing · Stock Market Forecasting Methods
