Evaluating Generative Ad Hoc Information Retrieval
Lukas Gienapp, Harrisen Scells, Niklas Deckers, Janek Bevendorff,, Shuai Wang, Johannes Kiesel, Shahbaz Syed, Maik Fr\"obe, Guido Zuccon, Benno, Stein, Matthias Hagen, Martin Potthast

TL;DR
This paper explores the challenges of evaluating generative retrieval systems that produce grounded text responses, proposing a foundation for developing new evaluation methods by surveying relevant literature and system architectures.
Contribution
It introduces a new user model and operationalizes it to address the limitations of existing ranking-based evaluation methods for generative retrieval.
Findings
Identified the gap in evaluation methodologies for generative retrieval
Developed a new user model for assessing generated responses
Surveyed literature to inform future evaluation frameworks
Abstract
Recent advances in large language models have enabled the development of viable generative retrieval systems. Instead of a traditional document ranking, generative retrieval systems often directly return a grounded generated text as a response to a query. Quantifying the utility of the textual responses is essential for appropriately evaluating such generative ad hoc retrieval. Yet, the established evaluation methodology for ranking-based ad hoc retrieval is not suited for the reliable and reproducible evaluation of generated responses. To lay a foundation for developing new evaluation methods for generative retrieval systems, we survey the relevant literature from the fields of information retrieval and natural language processing, identify search tasks and system architectures in generative retrieval, develop a new user model, and study its operationalization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Artificial Intelligence in Games
MethodsHigh-Order Consensuses
