Development and validation of a machine learning model to identify individuals at high risk for psychotic disorders using medical record data
Ben J. Marafino, Andrea H. Kline-Simon, Icelini Stavers-Sosa, David J. Cronkite, Lawrence D. Gerstley, Cimone Durojaiye, Ann Kelley, Linda Kiel, Arvind Ramaprasan, David S. Carrell, Robert B. Penfold, Matthew E. Hirschtritt

TL;DR
The study developed a machine learning model using electronic health records to identify young people at high risk of developing psychotic disorders, but found challenges with model calibration due to low disorder incidence.
Contribution
A novel machine learning model using EHR data to identify high-risk individuals for psychosis in routine clinical settings.
Findings
A gradient-boosting model with text features achieved the highest AUC (0.827) for predicting psychosis risk.
Model performance was consistent across subgroups but suffered from poor calibration due to low PSD incidence.
Restricting prediction to higher-risk populations could improve model calibration.
Abstract
Reducing the duration of untreated psychosis among individuals with early psychosis is associated with improved clinical outcomes and decreased long-term impairment. However, timely identification of individuals at high risk for psychotic disorders in routine clinical practice is challenging, and many individuals are only identified several years following psychotic-symptom onset. This study aimed to leverage comprehensive electronic medical records to develop and validate a machine learning model to identify individuals at high risk of conversion to a psychotic-spectrum disorder (PSD). This was a cross-sectional, retrospective analysis of electronic health record (EHR) data consisting of clinician free-text documentation and structured data (i.e., age, sex, race/ethnicity, psychiatric diagnoses, encounter modality, and department) among 406,268 Kaiser Permanente Northern California…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Schizophrenia research and treatment · Phosphodiesterase function and regulation
