Statistical Multicriteria Benchmarking via the GSD-Front
Christoph Jansen (1), Georg Schollmeyer (2), Julian Rodemann (2),, Hannah Blocher (2), Thomas Augustin (2) ((1) Lancaster University Leipzig,, (2) Ludwig-Maximilians-Universit\"at M\"unchen)

TL;DR
This paper introduces a statistically robust method for comparing classifiers across multiple metrics using the GSD-front, addressing uncertainty and robustness in benchmarking.
Contribution
It proposes the GSD-front as an efficient comparison tool, along with a statistical estimator and tests for classifier evaluation considering uncertainty and robustness.
Findings
GSD-front provides an information-efficient comparison framework.
A consistent estimator for the GSD-front is developed.
Robust statistical tests are introduced for classifier comparison.
Abstract
Given the vast number of classifiers that have been (and continue to be) proposed, reliable methods for comparing them are becoming increasingly important. The desire for reliability is broken down into three main aspects: (1) Comparisons should allow for different quality metrics simultaneously. (2) Comparisons should take into account the statistical uncertainty induced by the choice of benchmark suite. (3) The robustness of the comparisons under small deviations in the underlying assumptions should be verifiable. To address (1), we propose to compare classifiers using a generalized stochastic dominance ordering (GSD) and present the GSD-front as an information-efficient alternative to the classical Pareto-front. For (2), we propose a consistent statistical estimator for the GSD-front and construct a statistical test for whether a (potentially new) classifier lies in the GSD-front of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReservoir Engineering and Simulation Methods · Statistical and Computational Modeling · Rough Sets and Fuzzy Logic
MethodsSparse Evolutionary Training
