Offline Retrieval Evaluation Without Evaluation Metrics

Fernando Diaz; Andres Ferraro

arXiv:2204.11400·cs.IR·April 26, 2022

Offline Retrieval Evaluation Without Evaluation Metrics

Fernando Diaz, Andres Ferraro

PDF

Open Access 1 Repo

TL;DR

This paper introduces recall-paired preference (RPP), a metric-free evaluation method for offline retrieval that directly compares ranked lists, improving discrimination and robustness over traditional scalar metrics.

Contribution

The paper proposes RPP, a new evaluation approach that avoids scalar metrics, directly compares ranked lists, and better captures differences across user subpopulations.

Findings

01

RPP correlates well with existing metrics.

02

RPP improves discriminative power in evaluations.

03

RPP is robust to incomplete data.

Abstract

Offline evaluation of information retrieval and recommendation has traditionally focused on distilling the quality of a ranking into a scalar metric such as average precision or normalized discounted cumulative gain. We can use this metric to compare the performance of multiple systems for the same request. Although evaluation metrics provide a convenient summary of system performance, they also collapse subtle differences across users into a single number and can carry assumptions about user behavior and utility not supported across retrieval scenarios. We propose recall-paired preference (RPP), a metric-free evaluation method based on directly computing a preference between ranked lists. RPP simulates multiple user subpopulations per query and compares systems across these pseudo-populations. Our results across multiple search and recommendation tasks demonstrate that RPP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

diazf/pref_eval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Recommender Systems and Techniques · Multi-Criteria Decision Making