A Comparison of Methods for Evaluating Generative IR

Negar Arabzadeh; Charles L. A. Clarke

arXiv:2404.04044·cs.IR·April 11, 2024·23 cites

A Comparison of Methods for Evaluating Generative IR

Negar Arabzadeh, Charles L. A. Clarke

PDF

Open Access 1 Repo

TL;DR

This paper compares various evaluation methods for generative information retrieval systems, emphasizing the use of LLM-generated labels and assessing their alignment with human judgments across multiple tasks.

Contribution

It introduces and validates several evaluation approaches for Gen-IR, focusing on LLM-based labels and their effectiveness compared to human assessments.

Findings

01

LLM-based evaluation methods can approximate human judgments

02

Different evaluation strategies vary in autonomy and auditability

03

Validation across TREC tasks shows promising results for some methods

Abstract

Information retrieval systems increasingly incorporate generative components. For example, in a retrieval augmented generation (RAG) system, a retrieval component might provide a source of ground truth, while a generative component summarizes and augments its responses. In other systems, a large language model (LLM) might directly generate responses without consulting a retrieval component. While there are multiple definitions of generative information retrieval (Gen-IR) systems, in this paper we focus on those systems where the system's response is not drawn from a fixed collection of documents or passages. The response to a query may be entirely new text. Since traditional IR evaluation methods break down under this model, we explore various methods that extend traditional offline evaluation approaches to the Gen-IR context. Offline IR evaluation traditionally employs paid human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

narabzad/genir-evaluation
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Systems and Machine Learning · Machine Learning and ELM

MethodsFocus · ALIGN