An Exam-based Evaluation Approach Beyond Traditional Relevance Judgments

Naghmeh Farzi; Laura Dietz

arXiv:2402.00309·cs.IR·February 2, 2024·2 cites

An Exam-based Evaluation Approach Beyond Traditional Relevance Judgments

Naghmeh Farzi, Laura Dietz

PDF

Open Access

TL;DR

This paper introduces a novel evaluation method for information retrieval and generation systems that relies on exam questions and answerability rather than traditional relevance judgments, enabling more flexible and ongoing assessment.

Contribution

It proposes the EXAM Answerability Metric and a new paradigm for IR evaluation that does not depend on relevance judgments, using exam questions and answerability as core concepts.

Findings

01

Developed the EXAM Answerability Metric for system evaluation.

02

Introduced two measures: EXAM Cover and EXAM Qrels.

03

Enabled post-hoc expansion and continuous evaluation of systems.

Abstract

Current IR evaluation is based on relevance judgments, created either manually or automatically, with decisions outsourced to Large Language Models (LLMs). We offer an alternative paradigm, that never relies on relevance judgments in any form. Instead, a text is defined as relevant if it contains information that enables the answering of key questions. We use this idea to design the EXAM Answerability Metric to evaluate information retrieval/generation systems for their ability to provide topically relevant information. We envision the role of a human judge to edit and define an exam question bank that will test for the presence of relevant information in text. We support this step by generating an initial set of exam questions. In the next phase, an LLM-based question answering system will automatically grade system responses by tracking which exam questions are answerable with which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Educational Technology and Assessment

MethodsSparse Evolutionary Training · Focus