Open-World Evaluation for Retrieving Diverse Perspectives
Hung-Ting Chen, Eunsol Choi

TL;DR
This paper introduces BERDS, a benchmark for evaluating retrieval systems on their ability to surface diverse perspectives on subjective questions, using a language model-based evaluator to assess perspective coverage.
Contribution
The paper presents a new benchmark and evaluation methodology for retrieving diverse perspectives, addressing the challenge of subjective relevance beyond string matching.
Findings
Existing retrievers cover only 40% of perspectives.
Query expansion and reranking improve diversity.
Language model evaluator effectively assesses perspective relevance.
Abstract
We study retrieving a set of documents that covers various perspectives on a complex and contentious question (e.g., will ChatGPT do more harm than good?). We curate a Benchmark for Retrieval Diversity for Subjective questions (BERDS), where each example consists of a question and diverse perspectives associated with the question, sourced from survey questions and debate websites. On this data, retrievers paired with a corpus are evaluated to surface a document set that contains diverse perspectives. Our framing diverges from most retrieval tasks in that document relevancy cannot be decided by simple string matches to references. Instead, we build a language model-based automatic evaluator that decides whether each retrieved document contains a perspective. This allows us to evaluate the performance of three different types of corpus (Wikipedia, web snapshot, and corpus constructed on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGeographic Information Systems Studies · Semantic Web and Ontologies
MethodsSparse Evolutionary Training
