All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection in LLM Backtesting
Zeyu Zhang, Ryan Chen, Bradly C. Stadie

TL;DR
This paper introduces an interpretable framework for detecting and quantifying temporal knowledge leakage in LLM backtesting, proposing a new method that improves the reliability of retrospective evaluations by filtering out post-cutoff information.
Contribution
It presents a claim-level framework using Shapley values for detecting leakage and introduces TimeSPEC, a proactive approach for filtering temporal contamination in LLM predictions.
Findings
Standard prompts exhibit substantial temporal leakage.
TimeSPEC effectively reduces leakage while maintaining task performance.
The framework provides interpretable insights into decision-driving information.
Abstract
To evaluate whether LLMs can accurately predict future events, we need the ability to \textit{backtest} them on events that have already resolved. This requires models to reason only with information available at a specified past date. Yet LLMs may inadvertently leak post-cutoff knowledge encoded during training, undermining the validity of retrospective evaluation. We introduce a claim-level framework for detecting and quantifying this \emph{temporal knowledge leakage}. Our approach decomposes model rationales into atomic claims and categorizes them by temporal verifiability, then applies \textit{Shapley values} to measure each claim's contribution to the prediction. This yields the \textbf{Shapley}-weighted \textbf{D}ecision-\textbf{C}ritical \textbf{L}eakage \textbf{R}ate (\textbf{Shapley-DCLR}), an interpretable metric that captures what fraction of decision-driving reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Auditing, Earnings Management, Governance · Artificial Intelligence in Law
