Never mind the metrics -- what about the uncertainty? Visualising confusion matrix metric distributions
David Lovell, Dimity Miller, Jaiden Capra, Andrew Bradley

TL;DR
This paper emphasizes the importance of visualizing and understanding the uncertainty in classifier performance metrics, demonstrating how metric distributions can overshadow actual differences in model performance.
Contribution
It introduces a geometric and visual framework for representing the uncertainty in confusion matrices and performance metrics, highlighting the impact of data and class imbalance.
Findings
Metrics are highly affected by uncertainty and class imbalance.
Visualizations reveal how posterior predictive distributions influence performance metrics.
Understanding metric uncertainty can temper claims of model superiority.
Abstract
There are strong incentives to build models that demonstrate outstanding predictive performance on various datasets and benchmarks. We believe these incentives risk a narrow focus on models and on the performance metrics used to evaluate and compare them -- resulting in a growing body of literature to evaluate and compare metrics. This paper strives for a more balanced perspective on classifier performance metrics by highlighting their distributions under different models of uncertainty and showing how this uncertainty can easily eclipse differences in the empirical performance of classifiers. We begin by emphasising the fundamentally discrete nature of empirical confusion matrices and show how binary matrices can be meaningfully represented in a three dimensional compositional lattice, whose cross-sections form the basis of the space of receiver operating characteristic (ROC) curves.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Time Series Analysis and Forecasting · Advanced Clustering Algorithms Research
