UsefulBench: Towards Decision-Useful Information as a Target for Information Retrieval
Tobias Schimanski, Stefanie Lewandowski, Christian Woerle, Nicola Reichenau, Yauheni Huryn, Markus Leippold

TL;DR
UsefulBench introduces a dataset to evaluate whether texts are practically useful for queries, highlighting the gap between relevance and usefulness in information retrieval.
Contribution
The paper presents UsefulBench, a domain-specific dataset labeled for usefulness versus relevance, and analyzes the limitations of current retrieval systems and LLMs in capturing practical utility.
Findings
Classic similarity-based retrieval aligns more with relevance than usefulness.
LLMs can partially address relevance bias but lack domain-specific expertise.
UsefulBench poses a new challenge for developing targeted retrieval systems.
Abstract
Conventional information retrieval is concerned with identifying the relevance of texts for a given query. Yet, the conventional definition of relevance is dominated by aspects of similarity in texts, leaving unobserved whether the text is truly useful for addressing the query. For instance, when answering whether Paris is larger than Berlin, texts about Paris being in France are relevant (lexical/semantic similarity), but not useful. In this paper, we introduce UsefulBench, a domain-specific dataset curated by three professional analysts labeling whether a text is connected to a query (relevance) or holds practical value in responding to it (usefulness). We show that classic similarity-based information retrieval aligns more strongly with relevance. While LLM-based systems can counteract this bias, we find that domain-specific problems require a high degree of expertise, which current…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
