Semi-Supervised Approaches to Efficient Evaluation of Model Prediction   Performance

Jessica Gronsbell; Tianxi Cai

arXiv:1711.05663·stat.ME·November 16, 2017·1 cites

Semi-Supervised Approaches to Efficient Evaluation of Model Prediction Performance

Jessica Gronsbell, Tianxi Cai

PDF

Open Access

TL;DR

This paper introduces a semi-supervised method for efficiently evaluating the prediction performance of regression models, especially in medical applications where labeled data is costly, demonstrating improved accuracy over supervised methods.

Contribution

It proposes a novel two-step semi-supervised estimation procedure for model evaluation that reduces variance and corrects for overfitting, applicable to phenotyping algorithms from electronic medical records.

Findings

01

Semi-supervised estimators are consistent and asymptotically normal.

02

SS estimators have smaller variance than supervised ones under correct model specification.

03

Method performs well in simulations and real EMR studies.

Abstract

In many modern machine learning applications, the outcome is expensive or time-consuming to collect while the predictor information is easy to obtain. Semi-supervised learning (SSL) aims at utilizing large amounts of `unlabeled' data along with small amounts of `labeled' data to improve the efficiency of a classical supervised approach. Though numerous SSL classification and prediction procedures have been proposed in recent years, no methods currently exist to evaluate the prediction performance of a working regression model. In the context of developing phenotyping algorithms derived from electronic medical records (EMR), we present an efficient two-step estimation procedure for evaluating a binary classifier based on various prediction performance measures in the semi-supervised (SS) setting. In step I, the labeled data is used to obtain a non-parametrically calibrated estimate of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Machine Learning and Data Classification · Machine Learning and Algorithms