Interpretable Meta-Measure for Model Performance
Alicja Gosiewska, Katarzyna Wo\'znica, Przemys{\l}aw Biecek

TL;DR
This paper introduces Elo-based Predictive Power (EPP), a new interpretable meta-score for model performance that provides probabilistic comparisons and a unified benchmark ontology, addressing limitations of existing measures.
Contribution
The paper presents EPP, a novel performance measure with probabilistic interpretation and a unified benchmark framework, enhancing model comparison and benchmark description.
Findings
EPP scores have a clear probabilistic interpretation.
EPP allows direct comparison of performance differences across datasets.
Empirical validation on 30 classification datasets and visual data.
Abstract
Benchmarks for the evaluation of model performance play an important role in machine learning. However, there is no established way to describe and create new benchmarks. What is more, the most common benchmarks use performance measures that share several limitations. For example, the difference in performance for two models has no probabilistic interpretation, there is no reference point to indicate whether they represent a significant improvement, and it makes no sense to compare such differences between data sets. We introduce a new meta-score assessment named Elo-based Predictive Power (EPP) that is built on top of other performance measures and allows for interpretable comparisons of models. The differences in EPP scores have a probabilistic interpretation and can be directly compared between data sets, furthermore, the logistic regression-based design allows for an assessment of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · Bayesian Modeling and Causal Inference
