Batch Evaluation Metrics in Information Retrieval: Measures, Scales, and Meaning
Alistair Moffat

TL;DR
This paper defends the validity of existing IR evaluation metrics like reciprocal rank and average precision, arguing they are meaningful as ratio-scale measures of usefulness, countering recent claims that only uniform-step interval scales are appropriate.
Contribution
The paper challenges the recent push for interval scales in IR evaluation, defending the use of traditional metrics as valid ratio-scale measures of usefulness.
Findings
IR document rankings are categorical data
Effectiveness metrics can be represented as ratio-scale usefulness measures
Current IR metrics are more meaningful in their original form than intervalized versions
Abstract
A sequence of recent papers has considered the role of measurement scales in information retrieval (IR) experimentation, and presented the argument that (only) uniform-step interval scales should be used, and hence that well-known metrics such as reciprocal rank, expected reciprocal rank, normalized discounted cumulative gain, and average precision, should be either discarded as measurement tools, or adapted so that their metric values lie at uniformly-spaced points on the number line. These papers paint a rather bleak picture of past decades of IR evaluation, at odds with the community's overall emphasis on practical experimentation and measurable improvement. Our purpose in this work is to challenge that position. In particular, we argue that mappings from categorical and ordinal data to sets of points on the number line are valid provided there is an external reason for each target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Advanced Text Analysis Techniques · Data Management and Algorithms
