# Developing and Validating a Machine Learning Algorithm to Predict the Risk of Incident Opioid Use Disorder Among OneFlorida+ Patients: Prognostic Modeling Study

**Authors:** Jabed Al Faysal, Weihsuan Lo-Ciganic, Walid F Gellad, Yonghui Wu, Christopher A Harle, Khoa Nguyen, James L Huang, Gerald Cochran, Debbie L Wilson, Stephanie AS Staras, Siegfried OF Schmidt, Eric I Rosenberg, Danielle Nelson, Shunhua Yan, Gary M Reisfield, William M Greene, Courtney Kuza, Md Mahmudul Hasan

PMC · DOI: 10.2196/79482 · 2026-03-05

## TL;DR

This study developed a machine learning model using electronic health records to predict the risk of opioid use disorder in patients starting opioid therapy, which could help in early prevention efforts.

## Contribution

The novel contribution is the development and validation of a high-performing machine learning model for predicting opioid use disorder risk using EHR data.

## Key findings

- The GBM model achieved a C-statistic of 0.879 in predicting 3-month incident Opioid Use Disorder risk.
- The top decile of patients predicted by the model captured ~68% of those who developed OUD.
- The model demonstrated acceptable fairness across race, age, and sex with a low false negative rate.

## Abstract

Opioid use disorder (OUD) remains a critical public health crisis in the United States. Despite widespread policy and clinical interventions, early identification of individuals at risk for developing OUD remains challenging due to limitations in traditional screening approaches and a lack of individualized risk stratification methods. Machine learning (ML) methods offer an opportunity to develop timely, high-performing, and explainable predictive models that can enhance OUD prevention strategies in clinical settings.

This study aims to develop and validate an ML model using electronic health record (EHR) data to predict the 3-month risk of incident OUD among adults initiating opioid therapy and to stratify patients into clinically actionable risk groups.

This prognostic modeling study used 2017‐2022 OneFlorida+ EHR data to develop and validate ML algorithms predicting 3-month incident OUD risk. We included 182,083 adults (≥18 y) without cancer, overdose, or OUD or hospice history who received ≥1 outpatient, noninjectable opioid prescription. Using 183 predictors measured in sequential 3-month intervals, we developed an elastic net, least absolute shrinkage and selection operator, gradient boosting machine (GBM), and random forest models on randomly split training, testing, and validation sets. Model performance was assessed using C-statistics, predictive values, and number needed to evaluate, with patients stratified into risk deciles for clinical applicability. Model explainability was assessed using Shapley additive explanations, and fairness was evaluated using standard metrics. We externally validated the best-performing model using an independent cohort from the 2018‐2020 UPMC (formerly University of Pittsburgh Medical Center) health system.

In the validation sample (n=60,694), GBM (C-statistics=0.879, 95% CI 0.874‐0.884) and elastic net (C-statistics=0.872, 95% CI 0.867‐0.877) outperformed least absolute shrinkage and selection operator (C-statistics=0.846, 95% CI 0.840‐0.851) and random forest (C-statistics=0.798, 95% CI 0.792‐0.804), with GBM model requiring the fewest predictors (n=75) for predicting 3-month incident OUD. Using the GBM algorithm to predict the subsequent 3-month OUD risk, the top decile subgroup had a positive predictive value of 3.26%, a negative predictive value of 99.8%, and a number needed to evaluate of 31. The top decile (n=6696) captured ~68% of patients with OUD. Shapley additive explanations analysis identified age, number of outpatient visits, history of back and other pain conditions, comorbidity burden, and opioid prescribing patterns as the strongest predictors of incident OUD. Fairness assessment showed an acceptable false negative rate parity across race, age, and sex. In external validation on the UPMC cohort, the GBM model maintained good discrimination (C-statistics=0.756, 95% CI 0.750‐0.762) and effective risk stratification.

An ML algorithm predicting incident OUD derived from OneFlorida+ EHR data performed well in external validation with data using UPMC. The algorithm might be valuable for incident OUD risk prediction and stratification across health systems, with potential to inform early intervention.

## Full-text entities

- **Diseases:** cancer (MESH:D009369), back and other pain conditions (MESH:D001416), overdose (MESH:D062787), OUD (MESH:D009293)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12978897/full.md

---
Source: https://tomesphere.com/paper/PMC12978897