Still Fresh? Evaluating Temporal Drift in Retrieval Benchmarks

Nathan Kuissi; Suraj Subrahmanyan; Nandan Thakur; Jimmy Lin

arXiv:2603.04532·cs.IR·March 6, 2026

Still Fresh? Evaluating Temporal Drift in Retrieval Benchmarks

Nathan Kuissi, Suraj Subrahmanyan, Nandan Thakur, Jimmy Lin

PDF

Open Access

TL;DR

This paper investigates how temporal changes in technical corpora impact the reliability of retrieval benchmarks, finding that benchmarks can remain stable over time despite corpus evolution, with minimal effects on model rankings.

Contribution

It provides an empirical analysis of temporal corpus drift in retrieval benchmarks and demonstrates that such benchmarks can stay reliable over time with minor ranking shifts.

Findings

01

Most queries remain supported over a year despite corpus changes

02

Retrieval model rankings show high correlation over time

03

Temporal corpus drift has limited impact on benchmark reliability

Abstract

Information retrieval (IR) benchmarks typically follow the Cranfield paradigm, relying on static and predefined corpora. However, temporal changes in technical corpora, such as API deprecations and code reorganizations, can render existing benchmarks stale. In our work, we investigate how temporal corpus drift affects FreshStack, a retrieval benchmark focused on technical domains. We examine two independent corpus snapshots of FreshStack from October 2024 and October 2025 to answer questions about LangChain. Our analysis shows that all but one query posed in 2024 remain fully supported by the 2025 corpus, as relevant documents "migrate" from LangChain to competitor repositories, such as LlamaIndex. Next, we compare the accuracy of retrieval models on both snapshots and observe only minor shifts in model rankings, with overall strong correlation of up to 0.978 Kendall $τ$ at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Topic Modeling · Expert finding and Q&A systems