POPDx: An Automated Framework for Patient Phenotyping across 392,246 Individuals in the UK Biobank Study
Lu Yang, Sheng Wang, and Russ B. Altman

TL;DR
POPDx is a deep learning framework that accurately imputes phenotype codes for UK Biobank participants, including rare and unobserved conditions, enhancing cohort definition for epidemiological research.
Contribution
This paper introduces POPDx, a novel bilinear machine learning method for multi-phenotype recognition that improves accuracy over existing approaches and handles incomplete data.
Findings
POPDx predicts rare and unobserved phenotypes effectively.
Significant improvement in multi-phenotype recognition across 22 disease categories.
Enables identification of epidemiological features associated with each phenotype.
Abstract
Objective For the UK Biobank standardized phenotype codes are associated with patients who have been hospitalized but are missing for many patients who have been treated exclusively in an outpatient setting. We describe a method for phenotype recognition that imputes phenotype codes for all UK Biobank participants. Materials and Methods POPDx (Population-based Objective Phenotyping by Deep Extrapolation) is a bilinear machine learning framework for simultaneously estimating the probabilities of 1,538 phenotype codes. We extracted phenotypic and health-related information of 392,246 individuals from the UK Biobank for POPDx development and evaluation. A total of 12,803 ICD-10 diagnosis codes of the patients were converted to 1,538 Phecodes as gold standard labels. The POPDx framework was evaluated and compared to other available methods on automated multi-phenotype recognition. Results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics in Clinical Research · Genomics and Rare Diseases · Machine Learning in Healthcare
