FOReCAst: The Future Outcome Reasoning and Confidence Assessment Benchmark

Zhangdie Yuan; Zifeng Ding; Andreas Vlachos

arXiv:2502.19676·cs.LG·May 19, 2025

FOReCAst: The Future Outcome Reasoning and Confidence Assessment Benchmark

Zhangdie Yuan, Zifeng Ding, Andreas Vlachos

PDF

Open Access 1 Datasets

TL;DR

FOReCAst is a new benchmarking framework that assesses models' forecasting accuracy and confidence calibration across diverse real-world scenarios, addressing limitations of previous benchmarks.

Contribution

It introduces FOReCAst, a comprehensive benchmark covering multiple forecasting question types and confidence assessments, filling gaps in existing evaluation methods.

Findings

01

Evaluates models on Boolean, timeframe, and quantity questions.

02

Assesses both prediction accuracy and confidence calibration.

03

Provides a more realistic and comprehensive forecasting benchmark.

Abstract

Forecasting is an important task in many domains, such as technology and economics. However existing forecasting benchmarks largely lack comprehensive confidence assessment, focus on limited question types, and often consist of artificial questions that do not align with real-world human forecasting needs. To address these gaps, we introduce FOReCAst (Future Outcome Reasoning and Confidence Assessment), a benchmark that evaluates models' ability to make predictions and their confidence in them. FOReCAst spans diverse forecasting scenarios involving Boolean questions, timeframe prediction, and quantity estimation, enabling a comprehensive evaluation of both prediction accuracy and confidence calibration for real-world applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

MoyYuan/FOReCAst
dataset· 63 dl
63 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvaluation and Performance Assessment

MethodsFocus · ALIGN