# Development of a screening model for APL using cell population data and deep learning-extracted WBC scattergram features

**Authors:** Qi Cai, Bo Ye, Wenbo Zheng, Shihong Zhang, Jingxian Zhang, Yimin Shen, Donglan Yao, Huihui Zhang, Zhixi Huang, Jian Hu, Yushuai Ma, Jianbiao Wang, Yong Wang

PMC · DOI: 10.1186/s12885-025-15034-7 · 2025-11-07

## TL;DR

This paper introduces a machine learning model that can quickly detect acute promyelocytic leukemia (APL) using routine blood test data, helping hospitals with limited resources diagnose APL faster.

## Contribution

A novel two-stage machine learning model for APL screening using deep learning-extracted scattergram features and routine lab data.

## Key findings

- The RFC-S model achieved an AUC of 0.9893 in testing and 0.9979 in external validation.
- The model maintains 98.15% sensitivity and 95.52% specificity without requiring additional tests.
- SHAP analysis confirmed scattergram-derived features like N_APL_Ratio_YZ are key predictors.

## Abstract

Acute promyelocytic leukemia (APL), a high-risk subtype of acute myeloid leukemia, necessitates rapid diagnosis upon hospital admission to mitigate early mortality. Current diagnosing approaches relying on time-consuming genetic testing or morphological expertise are particularly challenging in resource-limited settings. Herein, this study introduces a novel machine learning approach leveraging routine lab data to enable immediate APL suspicion, offering a new diagnostic possibility for under-resourced hospitals.

We developed a two-stage machine learning model using multi-center retrospective data. The cohort included 94 confirmed APL patients (2020–2024) from three tertiary hospitals, with an external validation set (n = 541) from an independent center. Using four VGG-16 networks, we extracted APL-specific 3D scatterplot features from DIFF and WNB channels of routine blood tests. These features were then fed into an optimized random forest classifier-scatterplot (RFC-S) model, refined via recursive feature elimination and threshold tuning.

The RFC-S model achieved near-perfect discrimination, with an AUC of 0.9893 in the test set and 0.9979 in external validation. It maintained 98.15% sensitivity and 95.52% specificity—outperforming conventional methods. SHAP analysis confirmed that key scattergram-derived features (e.g., N_APL_Ratio_YZ) drove predictions. Critically, the model requires no additional tests, making it deployable even in low-resource clinics.

The RFC-S model represents an innovative approach to APL screening by combining deep learning-derived scattergram features with routine blood parameters. This two-stage methodology achieves high diagnostic accuracy (AUC > 0.98) while maintaining computational efficiency. Importantly, the model’s ability to utilize existing laboratory data without requiring additional tests makes it particularly valuable for resource-constrained settings where access to genetic testing or hematological expertise may be limited. Our findings suggest this approach could serve as a practical tool for early APL identification, potentially reducing diagnostic delays in diverse clinical environments.

The online version contains supplementary material available at 10.1186/s12885-025-15034-7.

## Linked entities

- **Diseases:** Acute promyelocytic leukemia (MONDO:0012883), acute myeloid leukemia (MONDO:0015667)

## Full-text entities

- **Diseases:** APL (MESH:D015473), acute myeloid leukemia (MESH:D015470)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12593920/full.md

---
Source: https://tomesphere.com/paper/PMC12593920