Artificial Intelligence-Enabled Comprehensive Electronic Health Record Phenotyping at a Large Scale
Niels Turley, Marta Fernandes, Aditya Gupta, Manohar Ghanta, Haoqi Sun, Robert Thomas, Sahar Zafar, M Brandon Westover

TL;DR
This paper uses AI to accurately identify ten diseases in a large patient dataset using electronic health records, achieving high accuracy with balanced data and cross-validation.
Contribution
A novel AI-enabled EHR phenotyping framework using balanced data and cross-validation for ten diseases in a large multi-site dataset.
Findings
AI-enabled EHR phenotyping achieved AUCs above 0.95 and AUPRCs above 0.83 across ten diseases.
Manual annotation and balanced dataset design improved model training and accuracy.
Detailed error analysis provided insights into false positives and negatives for each disease.
Abstract
The Electronic health record (EHR) contains rich and ever-growing information, especially for the gerontologic population with multiple comorbidities. With the advent of powerful artificial intelligence (AI) tools, we can perform accurate EHR phenotyping, which is the foundation for downstream analyses. Here, we performed EHR phenotyping of ten diseases in a large multi-site clinical dataset of 145,787 unique patients, including epilepsy (and subtypes), ischemic stroke, subarachnoid hemorrhage, subdural hematoma, Alzheimer’s diseases and related dementias, Parkinson’s disease, cardiac arrest, traumatic brain injury, brain tumor, and congestive heart failure. We used AI-enabled natural language processing that extracts the presence of keywords from unstructured clinical notes while considering negations, as well as structured diagnosis codes (ICD) and medications. We used logistic…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · Electronic Health Records Systems
