Phenotype Detection in Real World Data via Online MixEHR Algorithm
Ying Xu, Romane Gauriau, Anna Decker, Jacob Oppenheim

TL;DR
This paper introduces an online version of the mixEHR algorithm for phenotyping from large-scale electronic health records and claims data, enabling scalable, unsupervised disease pattern discovery and clinical insights.
Contribution
The authors developed an online extension of the mixEHR algorithm, allowing it to handle much larger datasets and discover new disease subtypes and comorbidities.
Findings
Recapitulated known disease groups
Discovered new clinically meaningful disease subtypes
Scalable to large datasets
Abstract
Understanding patterns of diagnoses, medications, procedures, and laboratory tests from electronic health records (EHRs) and health insurer claims is important for understanding disease risk and for efficient clinical development, which often require rules-based curation in collaboration with clinicians. We extended an unsupervised phenotyping algorithm, mixEHR, to an online version allowing us to use it on order of magnitude larger datasets including a large, US-based claims dataset and a rich regional EHR dataset. In addition to recapitulating previously observed disease groups, we discovered clinically meaningful disease subtypes and comorbidities. This work scaled up an effective unsupervised learning method, reinforced existing clinical knowledge, and is a promising approach for efficient collaboration with clinicians.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Machine Learning in Healthcare · AI in cancer detection
