# Deploying machine learning models in clinical settings: a real-world feasibility analysis for a model identifying adult-onset type 1 diabetes initially classified as type 2

**Authors:** Irene Brusini, Suyin Lee, Jacob Hollingsworth, Amanda Sees, Matthew Hackenberg, Harm Scherpbier, Raquel López-Díez, Nadejda Leavitt

PMC · DOI: 10.1093/jamiaopen/ooaf133 · 2025-10-26

## TL;DR

This study tests how well a machine learning model can identify adult-onset type 1 diabetes misclassified as type 2 in real-world health data and evaluates the challenges of deploying such models in clinical settings.

## Contribution

The first real-world evaluation of a machine learning model for identifying misclassified type 1 diabetes using health information exchange data.

## Key findings

- The national model performed well on HIE data (AUROC = 0.751; PR5 = 25.5%), and localization improved performance further (AUROC = 0.774; PR5 = 35.4%).
- Adjustments for HIE data compatibility revealed discrepancies in model predictors and highlighted the importance of aligning algorithm design with deployment needs.
- Data inconsistencies across HIE member organizations could undermine model accuracy and provider trust in ML tools.

## Abstract

This study evaluates the performance and deployment feasibility of a machine learning (ML) model to identify adult-onset type 1 diabetes (T1D) initially coded as type 2 on electronic medical records (EMRs) from a health information exchange (HIE). To our knowledge, this is the first evaluation of such a model on real-world HIE data.

An existing ML model, trained on national US EMR data, was tested on a regional HIE dataset, after several adjustments for compatibility. A localized model retrained on the regional dataset was compared to the national model. Discrepancies between the 2 datasets’ features and cohorts were also investigated.

The national model performed well on HIE data (AUROC = 0.751; precision at 5% recall [PR5] = 25.5%), and localization further improved performance (AUROC = 0.774; PR5 = 35.4%). Differences in the 2 models’ top predictors reflected the discrepancies between the datasets and gaps in HIE data capture.

The adjustments needed for testing on HIE data highlight the importance of aligning algorithm design with deployment needs. Moreover, localization increased precision, making it more appealing for patient screening, but added complexity and may impact scalability. Additionally, while HIEs offer opportunities for large-scale deployment, data inconsistencies across member organizations could undermine accuracy and providers’ trust in ML-based tools.

Our findings offer valuable insights into the feasibility of at-scale deployment of ML models for high-risk patient identification. Although this work focuses on detecting potentially misclassified T1D, our learnings can also inform other applications.

## Linked entities

- **Diseases:** type 1 diabetes (MONDO:0005147), type 2 diabetes (MONDO:0005148)

## Full-text entities

- **Diseases:** T1D (MESH:D003922), type 2 (MESH:D003924)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12557313/full.md

---
Source: https://tomesphere.com/paper/PMC12557313