Batch Evaluation Metrics in Information Retrieval: Measures, Scales, and   Meaning

Alistair Moffat

arXiv:2207.03103·cs.IR·July 8, 2022·1 cites

Batch Evaluation Metrics in Information Retrieval: Measures, Scales, and Meaning

Alistair Moffat

PDF

Open Access

TL;DR

This paper defends the validity of existing IR evaluation metrics like reciprocal rank and average precision, arguing they are meaningful as ratio-scale measures of usefulness, countering recent claims that only uniform-step interval scales are appropriate.

Contribution

The paper challenges the recent push for interval scales in IR evaluation, defending the use of traditional metrics as valid ratio-scale measures of usefulness.

Findings

01

IR document rankings are categorical data

02

Effectiveness metrics can be represented as ratio-scale usefulness measures

03

Current IR metrics are more meaningful in their original form than intervalized versions

Abstract

A sequence of recent papers has considered the role of measurement scales in information retrieval (IR) experimentation, and presented the argument that (only) uniform-step interval scales should be used, and hence that well-known metrics such as reciprocal rank, expected reciprocal rank, normalized discounted cumulative gain, and average precision, should be either discarded as measurement tools, or adapted so that their metric values lie at uniformly-spaced points on the number line. These papers paint a rather bleak picture of past decades of IR evaluation, at odds with the community's overall emphasis on practical experimentation and measurable improvement. Our purpose in this work is to challenge that position. In particular, we argue that mappings from categorical and ordinal data to sets of points on the number line are valid provided there is an external reason for each target…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Advanced Text Analysis Techniques · Data Management and Algorithms