Evaluating Generative Ad Hoc Information Retrieval

Lukas Gienapp; Harrisen Scells; Niklas Deckers; Janek Bevendorff,; Shuai Wang; Johannes Kiesel; Shahbaz Syed; Maik Fr\"obe; Guido Zuccon; Benno; Stein; Matthias Hagen; Martin Potthast

arXiv:2311.04694·cs.IR·May 24, 2024·1 cites

Evaluating Generative Ad Hoc Information Retrieval

Lukas Gienapp, Harrisen Scells, Niklas Deckers, Janek Bevendorff,, Shuai Wang, Johannes Kiesel, Shahbaz Syed, Maik Fr\"obe, Guido Zuccon, Benno, Stein, Matthias Hagen, Martin Potthast

PDF

Open Access

TL;DR

This paper explores the challenges of evaluating generative retrieval systems that produce grounded text responses, proposing a foundation for developing new evaluation methods by surveying relevant literature and system architectures.

Contribution

It introduces a new user model and operationalizes it to address the limitations of existing ranking-based evaluation methods for generative retrieval.

Findings

01

Identified the gap in evaluation methodologies for generative retrieval

02

Developed a new user model for assessing generated responses

03

Surveyed literature to inform future evaluation frameworks

Abstract

Recent advances in large language models have enabled the development of viable generative retrieval systems. Instead of a traditional document ranking, generative retrieval systems often directly return a grounded generated text as a response to a query. Quantifying the utility of the textual responses is essential for appropriately evaluating such generative ad hoc retrieval. Yet, the established evaluation methodology for ranking-based ad hoc retrieval is not suited for the reliable and reproducible evaluation of generated responses. To lay a foundation for developing new evaluation methods for generative retrieval systems, we survey the relevant literature from the fields of information retrieval and natural language processing, identify search tasks and system architectures in generative retrieval, develop a new user model, and study its operationalization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Information Retrieval and Search Behavior · Artificial Intelligence in Games

MethodsHigh-Order Consensuses