Replicability Measures for Longitudinal Information Retrieval Evaluation
J\"uri Keller, Timo Breuer, Philipp Schaer

TL;DR
This paper investigates the stability of IR system effectiveness over time using replicability measures, revealing that high initial performance does not guarantee long-term consistency in evolving information retrieval environments.
Contribution
It introduces a framework for assessing the longitudinal replicability of IR system effectiveness using adapted measures based on the LongEval shared task.
Findings
Effectiveness deteriorates over time in IR evaluations.
Ranking of IR systems varies across measures and time.
High initial effectiveness does not ensure persistence.
Abstract
Information Retrieval (IR) systems are exposed to constant changes in most components. Documents are created, updated, or deleted, the information needs are changing, and even relevance might not be static. While it is generally expected that the IR systems retain a consistent utility for the users, test collection evaluations rely on a fixed experimental setup. Based on the LongEval shared task and test collection, this work explores how the effectiveness measured in evolving experiments can be assessed. Specifically, the persistency of effectiveness is investigated as a replicability task. It is observed how the effectiveness progressively deteriorates over time compared to the initial measurement. Employing adapted replicability measures provides further insight into the persistence of effectiveness. The ranking of systems varies across retrieval measures and time. In conclusion, it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTechnology and Data Analysis · Information Retrieval and Search Behavior · Recommender Systems and Techniques
