Performance of weakly-supervised electronic health record-based phenotyping methods in rare-outcome settings

Yunjing Hong; Jennifer C. Nelson; Brian D. Williamson

arXiv:2604.09913·stat.ME·April 14, 2026

Performance of weakly-supervised electronic health record-based phenotyping methods in rare-outcome settings

Yunjing Hong, Jennifer C. Nelson, Brian D. Williamson

PDF

TL;DR

This study evaluates the effectiveness of weakly-supervised phenotyping methods using electronic health records in rare-outcome scenarios, highlighting their variable performance and importance of parameter tuning.

Contribution

It provides a comprehensive comparison of three weakly-supervised methods across diverse simulation settings for rare outcomes.

Findings

01

No single method outperformed others across all metrics.

02

SureLDA often performed well in simulations.

03

Performance is highly sensitive to tuning parameters.

Abstract

Accurately identifying patients with specific medical conditions is a key challenge when using clinical data from electronic health records. Our objective was to comprehensively assess when weakly-supervised prediction methods, which use silver-standard labels (proxy measures of the true outcome) rather than gold-standard true labels, perform well in rare-outcome settings like vaccine safety studies. We compared three methods (PheNorm, MAP, and sureLDA) that combine structured features and features derived from clinical text using natural language processing, through an extensive simulation study with data-generating mechanisms ranging from simple to complex, varying outcome rates, and varying degrees of informative silver labels. We also considered using predicted probabilities to design a chart review validation study. No single method dominated the other across all prediction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.