Surprise: Result List Truncation via Extreme Value Theory

Dara Bahri; Che Zheng; Yi Tay; Donald Metzler; Andrew Tomkins

arXiv:2010.09797·cs.IR·October 21, 2020

Surprise: Result List Truncation via Extreme Value Theory

Dara Bahri, Che Zheng, Yi Tay, Donald Metzler, Andrew Tomkins

PDF

Open Access

TL;DR

This paper introduces Surprise scoring, a statistical method based on extreme value theory, to improve result list truncation in information retrieval by providing calibrated relevance scores using only ranked scores.

Contribution

It proposes a novel truncation method leveraging the Generalized Pareto distribution for better relevance calibration in large-scale IR systems.

Findings

01

Effective across image, text, and IR datasets

02

Outperforms classical and recent baselines

03

Connects to hypothesis testing and p-values

Abstract

Work in information retrieval has largely been centered around ranking and relevance: given a query, return some number of results ordered by relevance to the user. The problem of result list truncation, or where to truncate the ranked list of results, however, has received less attention despite being crucial in a variety of applications. Such truncation is a balancing act between the overall relevance, or usefulness of the results, with the user cost of processing more results. Result list truncation can be challenging because relevance scores are often not well-calibrated. This is particularly true in large-scale IR systems where documents and queries are embedded in the same metric space and a query's nearest document neighbors are returned during inference. Here, relevance is inversely proportional to the distance between the query and candidate document, but what distance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Advanced Clustering Algorithms Research