The Geometry of Forgetting: Temporal Knowledge Drift as an Independent Axis in LLM Representations
Rania Elbadry, Ahmed Heakl, Fan Zhang, Dani Bouch, Yuxia Wang, Preslav Nakov, Zhuohan Xie

TL;DR
This paper reveals that temporal knowledge drift in large language models is encoded as a geometric direction orthogonal to correctness, making it undetectable by existing methods based on confidence or uncertainty signals.
Contribution
It introduces a geometric framework for understanding and detecting temporal knowledge drift in LLMs, demonstrating its orthogonality to traditional signals and providing new detection methods.
Findings
A linear probe on drift labels achieves AUROC 0.83-0.95.
Existing methods based on entropy and semantic signals remain near chance.
The geometric orthogonality is confirmed through multiple tests and analyses.
Abstract
Large language models confidently produce outdated answers, and no existing method can detect them. We show this is not an engineering failure but a structural one: temporal drift, whether a stored fact has changed since training, is encoded as a direction in the residual stream geometrically orthogonal to both correctness and uncertainty. Any method operating on correctness or uncertainty signals is therefore blind to drift by construction. We verify this across six instruction-tuned models. A linear probe trained directly on drift labels achieves AUROC --; methods based on token entropy, semantic entropy, CCS, and SAPLMA all remain near chance (--). Five tests confirm the geometric orthogonality: weight cosines (), score correlations (), bidirectional null-space projection (), iterative null-space projection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
