FOReCAst: The Future Outcome Reasoning and Confidence Assessment Benchmark
Zhangdie Yuan, Zifeng Ding, Andreas Vlachos

TL;DR
FOReCAst is a new benchmarking framework that assesses models' forecasting accuracy and confidence calibration across diverse real-world scenarios, addressing limitations of previous benchmarks.
Contribution
It introduces FOReCAst, a comprehensive benchmark covering multiple forecasting question types and confidence assessments, filling gaps in existing evaluation methods.
Findings
Evaluates models on Boolean, timeframe, and quantity questions.
Assesses both prediction accuracy and confidence calibration.
Provides a more realistic and comprehensive forecasting benchmark.
Abstract
Forecasting is an important task in many domains, such as technology and economics. However existing forecasting benchmarks largely lack comprehensive confidence assessment, focus on limited question types, and often consist of artificial questions that do not align with real-world human forecasting needs. To address these gaps, we introduce FOReCAst (Future Outcome Reasoning and Confidence Assessment), a benchmark that evaluates models' ability to make predictions and their confidence in them. FOReCAst spans diverse forecasting scenarios involving Boolean questions, timeframe prediction, and quantity estimation, enabling a comprehensive evaluation of both prediction accuracy and confidence calibration for real-world applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvaluation and Performance Assessment
MethodsFocus · ALIGN
