A Meta-Evaluation of C/W/L/A Metrics: System Ranking Similarity, System Ranking Consistency and Discriminative Power
Nuo Chen, Tetsuya Sakai

TL;DR
This paper evaluates the statistical stability of C/W/L/A metrics in IR system ranking, focusing on their similarity, consistency, and discriminative power across various aggregations and models.
Contribution
It systematically investigates the stability of C/W/L/A metrics with different aggregations, highlighting the superior performance of ERG and the impact of aggregation choices on metric effectiveness.
Findings
ERG aggregation outperforms others in stability and discriminative power.
Canonical aggregations of Precision, DCG, RBP, INST, and AP show strong performance.
Replacing ERR's canonical aggregation with ERG enhances discriminative power.
Abstract
Recently, Moffat et al. proposed an analytic framework, namely C/W/L/A, for offline evaluation metrics. This framework allows information retrieval (IR) researchers to design evaluation metrics through the flexible combination of user browsing models and user gain aggregations. However, the statistical stability of C/W/L/A metrics with different aggregations is not yet investigated. In this study, we investigate the statistical stability of C/W/L/A metrics from the perspective of: (1) the system ranking similarity among aggregations, (2) the system ranking consistency of aggregations and (3) the discriminative power of aggregations. More specifically, we combined various aggregation functions with the browsing model of Precision, Discounted Cumulative Gain (DCG), Rank-Biased Precision (RBP), INST, Average Precision (AP) and Expected Reciprocal Rank (ERR), examing their performances in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Recommender Systems and Techniques · Expert finding and Q&A systems
