Open-World Evaluation for Retrieving Diverse Perspectives

Hung-Ting Chen; Eunsol Choi

arXiv:2409.18110·cs.CL·April 23, 2025

Open-World Evaluation for Retrieving Diverse Perspectives

Hung-Ting Chen, Eunsol Choi

PDF

Open Access 1 Video

TL;DR

This paper introduces BERDS, a benchmark for evaluating retrieval systems on their ability to surface diverse perspectives on subjective questions, using a language model-based evaluator to assess perspective coverage.

Contribution

The paper presents a new benchmark and evaluation methodology for retrieving diverse perspectives, addressing the challenge of subjective relevance beyond string matching.

Findings

01

Existing retrievers cover only 40% of perspectives.

02

Query expansion and reranking improve diversity.

03

Language model evaluator effectively assesses perspective relevance.

Abstract

We study retrieving a set of documents that covers various perspectives on a complex and contentious question (e.g., will ChatGPT do more harm than good?). We curate a Benchmark for Retrieval Diversity for Subjective questions (BERDS), where each example consists of a question and diverse perspectives associated with the question, sourced from survey questions and debate websites. On this data, retrievers paired with a corpus are evaluated to surface a document set that contains diverse perspectives. Our framing diverges from most retrieval tasks in that document relevancy cannot be decided by simple string matches to references. Instead, we build a language model-based automatic evaluator that decides whether each retrieved document contains a perspective. This allows us to evaluate the performance of three different types of corpus (Wikipedia, web snapshot, and corpus constructed on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Open-World Evaluation for Retrieving Diverse Perspectives· underline

Taxonomy

TopicsGeographic Information Systems Studies · Semantic Web and Ontologies

MethodsSparse Evolutionary Training