Online Information Retrieval Evaluation using the STELLA Framework
Timo Breuer, Narges Tavakolpoursaleh, Johann Schaible, Daniel Hienert,, Philipp Schaer, Leyla Jael Castro

TL;DR
The paper introduces the STELLA framework, an infrastructure that enables large-scale A/B testing of academic search systems by integrating user interactions and log analysis in real-world settings.
Contribution
It presents a novel infrastructure that combines user data and experimental setups for continuous evaluation of IR systems in real environments.
Findings
Enables large-scale A/B experiments with real users
Integrates user interactions and log analysis for IR evaluation
Supports continuous, real-world system assessment
Abstract
Involving users in early phases of software development has become a common strategy as it enables developers to consider user needs from the beginning. Once a system is in production, new opportunities to observe, evaluate and learn from users emerge as more information becomes available. Gathering information from users to continuously evaluate their behavior is a common practice for commercial software, while the Cranfield paradigm remains the preferred option for Information Retrieval (IR) and recommendation systems in the academic world. Here we introduce the Infrastructures for Living Labs STELLA project which aims to create an evaluation infrastructure allowing experimental systems to run along production web-based academic search systems with real users. STELLA combines user interactions and log files analyses to enable large-scale A/B experiments for academic search.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
