Combining Evaluation Metrics via the Unanimous Improvement Ratio and its Application to Clustering Tasks
Enrique Amig\'o, Julio Gonzalo, Javier Artiles, Felisa Verdejo

TL;DR
The paper introduces the Unanimous Improvement Ratio (UIR), a robustness measure for evaluating AI system differences across weighted metrics, validated through experiments in text clustering.
Contribution
It proposes UIR as a new metric to assess the robustness of system comparisons against weight variations, with empirical validation in text clustering tasks.
Findings
UIR effectively indicates robustness of system differences.
UIR predicts consistency of results across different test beds.
Experiments confirm UIR's usefulness in clustering evaluation.
Abstract
Many Artificial Intelligence tasks cannot be evaluated with a single quality criterion and some sort of weighted combination is needed to provide system rankings. A problem of weighted combination measures is that slight changes in the relative weights may produce substantial changes in the system rankings. This paper introduces the Unanimous Improvement Ratio (UIR), a measure that complements standard metric combination criteria (such as van Rijsbergen's F-measure) and indicates how robust the measured differences are to changes in the relative weights of the individual metrics. UIR is meant to elucidate whether a perceived difference between two systems is an artifact of how individual metrics are weighted. Besides discussing the theoretical foundations of UIR, this paper presents empirical results that confirm the validity and usefulness of the metric for the Text Clustering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Rough Sets and Fuzzy Logic · Data Mining Algorithms and Applications
