A Meta-Evaluation of C/W/L/A Metrics: System Ranking Similarity, System   Ranking Consistency and Discriminative Power

Nuo Chen; Tetsuya Sakai

arXiv:2307.02936·cs.IR·August 8, 2023

A Meta-Evaluation of C/W/L/A Metrics: System Ranking Similarity, System Ranking Consistency and Discriminative Power

Nuo Chen, Tetsuya Sakai

PDF

Open Access

TL;DR

This paper evaluates the statistical stability of C/W/L/A metrics in IR system ranking, focusing on their similarity, consistency, and discriminative power across various aggregations and models.

Contribution

It systematically investigates the stability of C/W/L/A metrics with different aggregations, highlighting the superior performance of ERG and the impact of aggregation choices on metric effectiveness.

Findings

01

ERG aggregation outperforms others in stability and discriminative power.

02

Canonical aggregations of Precision, DCG, RBP, INST, and AP show strong performance.

03

Replacing ERR's canonical aggregation with ERG enhances discriminative power.

Abstract

Recently, Moffat et al. proposed an analytic framework, namely C/W/L/A, for offline evaluation metrics. This framework allows information retrieval (IR) researchers to design evaluation metrics through the flexible combination of user browsing models and user gain aggregations. However, the statistical stability of C/W/L/A metrics with different aggregations is not yet investigated. In this study, we investigate the statistical stability of C/W/L/A metrics from the perspective of: (1) the system ranking similarity among aggregations, (2) the system ranking consistency of aggregations and (3) the discriminative power of aggregations. More specifically, we combined various aggregation functions with the browsing model of Precision, Discounted Cumulative Gain (DCG), Rank-Biased Precision (RBP), INST, Average Precision (AP) and Expected Reciprocal Rank (ERR), examing their performances in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Recommender Systems and Techniques · Expert finding and Q&A systems