DS@GT at LongEval: Evaluating Temporal Performance in Web Search Systems and Topics with Two-Stage Retrieval
Anthony Miyaguchi, Imran Afrulbasha, Aleksandar Pramov

TL;DR
This paper evaluates the temporal robustness of web search systems using a two-stage retrieval approach across web snapshots, highlighting performance variations over time and proposing methods to address content evolution.
Contribution
It introduces a two-phase retrieval system with query expansion and reranking for temporal web search evaluation, and provides an analysis of performance across web snapshots in the LongEval setting.
Findings
Best system achieved NDCG@10 of 0.296 overall
Performance varied significantly across different web snapshots
Source code is publicly available for reproducibility
Abstract
Information Retrieval (IR) models are often trained on static datasets, making them vulnerable to performance degradation as web content evolves. The DS@GT competition team participated in the Longitudinal Evaluation of Model Performance (LongEval) lab at CLEF 2025, which evaluates IR systems across temporally distributed web snapshots. Our analysis of the Qwant web dataset includes exploratory data analysis with topic modeling over time. The two-phase retrieval system employs sparse keyword searches, utilizing query expansion and document reranking. Our best system achieves an average NDCG@10 of 0.296 across the entire training and test dataset, with an overall best score of 0.395 on 2023-05. The accompanying source code for this paper is at https://github.com/dsgt-arc/longeval-2025
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
