A Framework for Efficient Model Evaluation through Stratification, Sampling, and Estimation
Riccardo Fogliato, Pratik Patil, Mathew Monfort, Pietro Perona

TL;DR
This paper introduces a statistical framework for model evaluation that uses stratification, sampling, and estimation techniques to improve accuracy estimation efficiency and reduce annotation costs in machine learning and computer vision.
Contribution
It proposes a novel stratification method using k-means clustering based on model predictions, enhancing estimator efficiency over traditional random sampling.
Findings
Stratification via k-means improves estimator precision by 10x.
Model-assisted estimators outperform traditional methods in efficiency.
The framework reduces annotation costs while maintaining accurate performance estimates.
Abstract
Model performance evaluation is a critical and expensive task in machine learning and computer vision. Without clear guidelines, practitioners often estimate model accuracy using a one-time completely random selection of the data. However, by employing tailored sampling and estimation strategies, one can obtain more precise estimates and reduce annotation costs. In this paper, we propose a statistical framework for model evaluation that includes stratification, sampling, and estimation components. We examine the statistical properties of each component and evaluate their efficiency (precision). One key result of our work is that stratification via k-means clustering based on accurate predictions of model performance yields efficient estimators. Our experiments on computer vision datasets show that this method consistently provides more precise accuracy estimates than the traditional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
Methodsk-Means Clustering
