Statistical Multicriteria Benchmarking via the GSD-Front

Christoph Jansen (1); Georg Schollmeyer (2); Julian Rodemann (2),; Hannah Blocher (2); Thomas Augustin (2) ((1) Lancaster University Leipzig,; (2) Ludwig-Maximilians-Universit\"at M\"unchen)

arXiv:2406.03924·stat.ML·June 7, 2024

Statistical Multicriteria Benchmarking via the GSD-Front

Christoph Jansen (1), Georg Schollmeyer (2), Julian Rodemann (2),, Hannah Blocher (2), Thomas Augustin (2) ((1) Lancaster University Leipzig,, (2) Ludwig-Maximilians-Universit\"at M\"unchen)

PDF

Open Access 1 Video

TL;DR

This paper introduces a statistically robust method for comparing classifiers across multiple metrics using the GSD-front, addressing uncertainty and robustness in benchmarking.

Contribution

It proposes the GSD-front as an efficient comparison tool, along with a statistical estimator and tests for classifier evaluation considering uncertainty and robustness.

Findings

01

GSD-front provides an information-efficient comparison framework.

02

A consistent estimator for the GSD-front is developed.

03

Robust statistical tests are introduced for classifier comparison.

Abstract

Given the vast number of classifiers that have been (and continue to be) proposed, reliable methods for comparing them are becoming increasingly important. The desire for reliability is broken down into three main aspects: (1) Comparisons should allow for different quality metrics simultaneously. (2) Comparisons should take into account the statistical uncertainty induced by the choice of benchmark suite. (3) The robustness of the comparisons under small deviations in the underlying assumptions should be verifiable. To address (1), we propose to compare classifiers using a generalized stochastic dominance ordering (GSD) and present the GSD-front as an information-efficient alternative to the classical Pareto-front. For (2), we propose a consistent statistical estimator for the GSD-front and construct a statistical test for whether a (potentially new) classifier lies in the GSD-front of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Statistical Multicriteria Benchmarking via the GSD-Front· slideslive

Taxonomy

TopicsReservoir Engineering and Simulation Methods · Statistical and Computational Modeling · Rough Sets and Fuzzy Logic

MethodsSparse Evolutionary Training