How Much Freedom Does An Effectiveness Metric Really Have?
Alistair Moffat, Joel Mackenzie

TL;DR
This paper investigates the inherent constraints in effectiveness metrics for search engines, revealing that their freedom to assign scores is limited by fundamental ordering rules, and proposes a new evaluation paradigm based on innate pairwise relationships.
Contribution
It identifies and formalizes innate pairwise SERP ordering constraints and introduces a measurement approach that emphasizes a single principled metric reinforced by these innate relationships.
Findings
Innate pairwise SERP orderings are prevalent and constrain metric scores.
Many metrics are not independent due to these ordering constraints.
A new evaluation paradigm is proposed based on innate relationships and a single chosen metric.
Abstract
It is tempting to assume that because effectiveness metrics have free choice to assign scores to search engine result pages (SERPs) there must thus be a similar degree of freedom as to the relative order that SERP pairs can be put into. In fact that second freedom is, to a considerable degree, illusory. That's because if one SERP in a pair has been given a certain score by a metric, fundamental ordering constraints in many cases then dictate that the score for the second SERP must be either not less than, or not greater than, the score assigned to the first SERP. We refer to these fixed relationships as innate pairwise SERP orderings. Our first goal in this work is to describe and defend those pairwise SERP relationship constraints, and tabulate their relative occurrence via both exhaustive and empirical experimentation. We then consider how to employ such innate pairwise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExpert finding and Q&A systems
