Teaching Large Language Models When Not to Know: Learning Temporal Critique for Ex-Ante Reasoning
Chenlu Ding, Jiancan Wu, Yanchen Luo, Zheyuan Liu, Yancheng Yuan, Xiang Wang

TL;DR
This paper investigates how large language models fail to reason correctly with temporal cutoffs and introduces a fine-tuning method, TCFT, to improve their ability to verify temporal admissibility of responses.
Contribution
The paper identifies limitations of prompting and supervised fine-tuning for temporal reasoning and proposes TCFT, a fine-tuning framework that enhances models' temporal verification capabilities.
Findings
TCFT reduces temporal leakage by over 40 percentage points.
Explicit cutoff prompts outperform implicit historical framings.
Fine-tuning with TCFT improves temporal admissibility judgment accuracy.
Abstract
Large language models (LLMs) often fail to reason under temporal cutoffs: when prompted to answer from the standpoint of an earlier time, they exploit knowledge that became available only later. We study this failure through the lens of ex-ante reasoning, where a model must rely exclusively on information knowable before a cutoff. Through a systematic analysis of prompt-level interventions, we find that temporal leakage is highly sensitive to cutoff formulation and instruction placement: explicit cutoff statements outperform implicit historical framings, and prefix constraints reduce leakage more effectively than suffix constraints. These findings indicate that prompting can steer models into a temporal frame, but does not endow them with the ability to verify whether a response is temporally admissible. We further argue that supervised fine-tuning is insufficient, since ex-ante…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
