OracleProto: A Reproducible Framework for Benchmarking LLM Native Forecasting via Knowledge Cutoff and Temporal Masking

Yiding Ma; Chengyun Ruan; Kaibo Huang; Zhongliang Yang; Linna Zhou

arXiv:2605.03762·cs.AI·May 6, 2026

OracleProto: A Reproducible Framework for Benchmarking LLM Native Forecasting via Knowledge Cutoff and Temporal Masking

Yiding Ma, Chengyun Ruan, Kaibo Huang, Zhongliang Yang, Linna Zhou

PDF

1 Repo 1 Datasets

TL;DR

OracleProto is a reproducible framework that evaluates large language models' forecasting abilities by reconstructing past events into time-bound samples, enabling fair comparison and reducing information leakage.

Contribution

It introduces a novel, reproducible benchmarking method that distinguishes genuine forecasting from learned facts, with controlled leakage and hierarchical scoring.

Findings

01

Distinguishes forecasting quality, stability, and efficiency across models.

02

Reduces residual information leakage to below 1%.

03

Provides a reusable, auditable dataset for model evaluation.

Abstract

Large language models are moving from static text generators toward real-world decision-support systems, where forecasting is a composite capability that links information gathering, evidence integration, situational judgment, and action-oriented decision making. This capability is in broad demand across finance, policy, industry, and scientific research, yet its evaluation remains difficult: live benchmarks evaluate forecasts before answers exist, making them the cleanest way to measure forecasting ability, but they expire once events resolve; retrospective benchmarks are reproducible, but they cannot reliably distinguish genuine forecasting from facts a model may have already learned during pretraining. Prompting models to "pretend not to know" cannot replace a genuine knowledge boundary. We propose OracleProto, a reproducible framework for evaluating LLM native forecasting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MaYiding/OracleProto
github

Datasets

MaYiding/OracleProto
dataset· 197 dl
197 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.