Semisupervised Classifier Evaluation and Recalibration
Peter Welinder, Max Welling, Pietro Perona

TL;DR
This paper introduces SPE, a semisupervised method for estimating classifier performance and recalibration on new datasets with limited labels, using a generative confidence score model.
Contribution
It presents a novel semisupervised approach for performance estimation and recalibration based on a generative model of confidence scores, reducing the need for extensive labeled data.
Findings
Accurately estimates performance curves with few labels
Provides confidence bounds for performance estimates
Enables classifier recalibration using limited labeled data
Abstract
How many labeled examples are needed to estimate a classifier's performance on a new dataset? We study the case where data is plentiful, but labels are expensive. We show that by making a few reasonable assumptions on the structure of the data, it is possible to estimate performance curves, with confidence bounds, using a small number of ground truth labels. Our approach, which we call Semisupervised Performance Evaluation (SPE), is based on a generative model for the classifier's confidence scores. In addition to estimating the performance of classifiers on new datasets, SPE can be used to recalibrate a classifier by re-estimating the class-conditional confidence distributions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Algorithms and Data Compression
