Statistical comparison of classifiers through Bayesian hierarchical modelling
Giorgio Corani, Alessio Benavoli, Janez Dem\v{s}ar, Francesca, Mangili, Marco Zaffalon

TL;DR
This paper introduces a Bayesian hierarchical model for comparing classifiers across multiple datasets, providing more reliable probability estimates of their equivalence or difference than traditional null hypothesis tests.
Contribution
It presents a novel Bayesian hierarchical approach that jointly analyzes cross-validation results, reducing estimation error and overcoming limitations of NHST in classifier comparison.
Findings
Reduces estimation error compared to traditional methods
Provides posterior probabilities of classifier equivalence or difference
Improves reliability of classifier comparison across datasets
Abstract
Usually one compares the accuracy of two competing classifiers via null hypothesis significance tests (nhst). Yet the nhst tests suffer from important shortcomings, which can be overcome by switching to Bayesian hypothesis testing. We propose a Bayesian hierarchical model which jointly analyzes the cross-validation results obtained by two classifiers on multiple data sets. It returns the posterior probability of the accuracies of the two classifiers being practically equivalent or significantly different. A further strength of the hierarchical model is that, by jointly analyzing the results obtained on all data sets, it reduces the estimation error compared to the usual approach of averaging the cross-validation results obtained on a given data set.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Neural Networks and Applications · Advanced Statistical Methods and Models
