Beyond Point Estimates: Distributional Uncertainty in Machine Learning Performance Evaluation

Christoph Lehmann; Yahor Paromau

arXiv:2501.16931·cs.LG·May 13, 2026

Beyond Point Estimates: Distributional Uncertainty in Machine Learning Performance Evaluation

Christoph Lehmann, Yahor Paromau

PDF

TL;DR

This paper introduces a distributional approach to evaluate machine learning models by analyzing the variability of performance metrics as random variables, especially useful for small sample sizes.

Contribution

It proposes methods for empirical distribution analysis of performance metrics, enabling statistical inference of variability and uncertainty in model evaluation.

Findings

01

Feasible statistical inference on performance distribution with small samples (10-25)

02

Standard confidence intervals remain valid for small sample sizes

03

Distributional evaluation offers more detailed model comparison and risk assessment

Abstract

Machine learning models are often evaluated using point estimates of performance metrics such as accuracy, F1 score, or mean squared error. Such summaries fail to capture the inherent variability induced by stochastic elements of the training process, including data splitting, initialization, and hyperparameter optimization. This work proposes a distributional perspective on model evaluation by treating performance metrics as random quantities rather than fixed values. Instead of focusing solely on aggregate measures, empirical distributions of performance metrics are analyzed using quantiles and corresponding confidence intervals. The study investigates point and interval estimation of quantiles based on real-data use cases for classification and regression tasks, complemented by simulation studies for validation. Special emphasis is placed on small sample sizes, reflecting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.