Testing Rankings with Cross-Validation
Bal\'azs R. Sziklai, M\'at\'e Baranyi, K\'aroly H\'eberger

TL;DR
This paper evaluates hybrid statistical tests with cross-validation for comparing rankings, finding that Wilcoxon's test with eight folds generally performs best across various data scenarios.
Contribution
It introduces a comprehensive framework for testing ranking similarity using cross-validation combined with multiple statistical tests, and systematically compares their effectiveness.
Findings
Wilcoxon's test with eight folds is most reliable across scenarios.
Dietterich and Alpaydin tests excel in type I error but perform poorly in type II.
The methodology is applicable in diverse fields like machine learning and social sciences.
Abstract
This research investigates how to determine whether two rankings come from the same distribution. We evaluate three hybrid tests: Wilcoxon's, Dietterich's, and Alpaydin's statistical tests combined with cross-validation (CV), each operating with folds ranging from 5 to 10, thus altogether 18 variants. We have applied these tests in the framework of a popular comparative statistical test, the Sum of Ranking Differences that builds upon the Manhattan distance between the rankings. The introduced methodology is widely applicable from machine learning through social sciences. To compare these methods, we have followed an innovative approach borrowed from Economics. We designed nine scenarios for testing type I and II errors. These represent typical situations (that is, different data structures) that CV tests face routinely. The optimal CV method depends on the preferences regarding the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Bayesian Modeling and Causal Inference · Multi-Criteria Decision Making
