All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection in LLM Backtesting

Zeyu Zhang; Ryan Chen; Bradly C. Stadie

arXiv:2602.17234·cs.AI·February 20, 2026

All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection in LLM Backtesting

Zeyu Zhang, Ryan Chen, Bradly C. Stadie

PDF

Open Access

TL;DR

This paper introduces an interpretable framework for detecting and quantifying temporal knowledge leakage in LLM backtesting, proposing a new method that improves the reliability of retrospective evaluations by filtering out post-cutoff information.

Contribution

It presents a claim-level framework using Shapley values for detecting leakage and introduces TimeSPEC, a proactive approach for filtering temporal contamination in LLM predictions.

Findings

01

Standard prompts exhibit substantial temporal leakage.

02

TimeSPEC effectively reduces leakage while maintaining task performance.

03

The framework provides interpretable insights into decision-driving information.

Abstract

To evaluate whether LLMs can accurately predict future events, we need the ability to \textit{backtest} them on events that have already resolved. This requires models to reason only with information available at a specified past date. Yet LLMs may inadvertently leak post-cutoff knowledge encoded during training, undermining the validity of retrospective evaluation. We introduce a claim-level framework for detecting and quantifying this \emph{temporal knowledge leakage}. Our approach decomposes model rationales into atomic claims and categorizes them by temporal verifiability, then applies \textit{Shapley values} to measure each claim's contribution to the prediction. This yields the \textbf{Shapley}-weighted \textbf{D}ecision-\textbf{C}ritical \textbf{L}eakage \textbf{R}ate (\textbf{Shapley-DCLR}), an interpretable metric that captures what fraction of decision-driving reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Auditing, Earnings Management, Governance · Artificial Intelligence in Law