Should we really use post-hoc tests based on mean-ranks?
Alessio Benavoli, Giorgio Corani, Francesca Mangili

TL;DR
This paper critiques the common use of mean-ranks post-hoc tests after Friedman tests in algorithm comparison, highlighting their inconsistencies and advocating for pairwise tests like Wilcoxon instead.
Contribution
It identifies fundamental issues with mean-ranks post-hoc tests, demonstrating their dependence on the entire algorithm pool and proposing more reliable pairwise alternatives.
Findings
Mean-ranks test outcomes depend on the pool of algorithms.
Using mean-ranks can lead to paradoxical significance results.
Pairwise tests like Wilcoxon are recommended for more consistent comparisons.
Abstract
The statistical comparison of multiple algorithms over multiple data sets is fundamental in machine learning. This is typically carried out by the Friedman test. When the Friedman test rejects the null hypothesis, multiple comparisons are carried out to establish which are the significant differences among algorithms. The multiple comparisons are usually performed using the mean-ranks test. The aim of this technical note is to discuss the inconsistencies of the mean-ranks post-hoc test with the goal of discouraging its use in machine learning as well as in medicine, psychology, etc.. We show that the outcome of the mean-ranks test depends on the pool of algorithms originally included in the experiment. In other words, the outcome of the comparison between algorithms A and B depends also on the performance of the other algorithms included in the original experiment. This can lead to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Machine Learning and Data Classification · Algorithms and Data Compression
