Semi-Supervised Approaches to Efficient Evaluation of Model Prediction Performance
Jessica Gronsbell, Tianxi Cai

TL;DR
This paper introduces a semi-supervised method for efficiently evaluating the prediction performance of regression models, especially in medical applications where labeled data is costly, demonstrating improved accuracy over supervised methods.
Contribution
It proposes a novel two-step semi-supervised estimation procedure for model evaluation that reduces variance and corrects for overfitting, applicable to phenotyping algorithms from electronic medical records.
Findings
Semi-supervised estimators are consistent and asymptotically normal.
SS estimators have smaller variance than supervised ones under correct model specification.
Method performs well in simulations and real EMR studies.
Abstract
In many modern machine learning applications, the outcome is expensive or time-consuming to collect while the predictor information is easy to obtain. Semi-supervised learning (SSL) aims at utilizing large amounts of `unlabeled' data along with small amounts of `labeled' data to improve the efficiency of a classical supervised approach. Though numerous SSL classification and prediction procedures have been proposed in recent years, no methods currently exist to evaluate the prediction performance of a working regression model. In the context of developing phenotyping algorithms derived from electronic medical records (EMR), we present an efficient two-step estimation procedure for evaluating a binary classifier based on various prediction performance measures in the semi-supervised (SS) setting. In step I, the labeled data is used to obtain a non-parametrically calibrated estimate of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Machine Learning and Data Classification · Machine Learning and Algorithms
