Towards the Efficient Inference by Incorporating Automated Computational Phenotypes under Covariate Shift
Chao Ying, Jun Jin, Yi Guo, Xiudi Li, Muxuan Liang, Jiwei Zhao

TL;DR
This paper introduces a semi-supervised learning approach that effectively incorporates automated computational phenotypes (ACPs) under covariate shift, improving inference efficiency while maintaining validity in real-world biomedical data analysis.
Contribution
It develops doubly robust, semiparametrically efficient estimators that leverage ACPs for both labeled and unlabeled data under covariate shift, with a focus on efficiency gains from unlabeled data.
Findings
Incorporating ACPs for unlabeled data enhances efficiency.
The proposed estimators are doubly robust and semiparametrically efficient.
Empirical validation confirms practical advantages of the method.
Abstract
Collecting gold-standard phenotype data via manual extraction is typically labor-intensive and slow, whereas automated computational phenotypes (ACPs) offer a systematic and much faster alternative. However, simply replacing the gold-standard with ACPs, without acknowledging their differences, could lead to biased results and misleading conclusions. Motivated by the complexity of incorporating ACPs while maintaining the validity of downstream analyses, in this paper, we consider a semi-supervised learning setting that consists of both labeled data (with gold-standard) and unlabeled data (without gold-standard), under the covariate shift framework. We develop doubly robust and semiparametrically efficient estimators that leverage ACPs for general target parameters in the unlabeled and combined populations. In addition, we carefully analyze the efficiency gains achieved by incorporating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Single-cell and spatial transcriptomics
