# Development and application of machine learning models for hematological disease diagnosis using routine laboratory parameters: a user-friendly diagnostic platform

**Authors:** Jingya Liu, Yang Gou, Wuchen Yang, Hao Wang, Jing Zhang, Shengwang Wu, Siheng Liu, Tinglu Tao, Yongjie Tang, Cheng Yang, Siyin Chen, Ping Wang, Yimei Feng, Cheng Zhang, Shuiqing Liu, Xiangui Peng, Xi Zhang

PMC · DOI: 10.3389/fmed.2025.1605868 · Frontiers in Medicine · 2025-10-01

## TL;DR

This study develops machine learning models and a user-friendly platform to accurately diagnose hematological diseases using routine lab data, improving accessibility in resource-limited areas.

## Contribution

The novel contribution is the development of two optimized machine learning models and a diagnostic platform using routine lab parameters for hematological disease diagnosis.

## Key findings

- EnMod1-46 and EnMod2-12 achieved high accuracy and AUC in diagnosing 16 hematological diseases.
- EnMod1-46 performed comparably to senior hematologists and better than junior ones.
- A user-friendly diagnostic platform was developed based on the simpler model for improved accessibility.

## Abstract

In recent years, with the change of social environment, the incidence and detection rate of hematological diseases have shown an increasing trend. Early diagnosis and detection of hematological diseases are very important to improve the quality of life and prognosis of patients.

In this study, we employed 54 clinical and conventional laboratory parameters. By optimally combining multiple feature selection methods and machine learning algorithms, we developed 7 machine learning models with varying feature set sizes. We comprehensively evaluated the performance of these models, analyzed the interpretability of the optimal and simplified models using SHapley Additive exPlanations (SHAP), and compared these two models with the diagnostic performance of hematologists. Finally, we developed a user-friendly diagnostic platform.

The results showed that the ensemble model_1 with 46 feature parameters (EnMod1-46) and the simple ensemble model_2 with 12 feature parameters (EnMod2-12) demonstrated significant performance in diagnosing 16 types of hematological diseases. On the temporally distinct test set_1, the EnMod1-46 achieved an accuracy of 0.804 and an area under the curve (AUC) of 0.964, while EnMod2-12 attained an accuracy of 0.784 and an AUC of 0.961. To further validate the model’s generalization performance, EnMod1-46 achieved an accuracy of 0.738 and an AUC of 0.973 on the independent external test set_2, while EnMod2-12 yielded an accuracy of 0.705 and an AUC of 0.962. SHAP analysis showed that PLT, WBC, MCV, HGB, RBC and age were significant parameters in both models. Comparative analysis of clinical diagnosis revealed that the performance of EnMod1-46 and EnMod2-12 outperformed junior hematologists, while EnMod1-46 was comparable to senior hematologists. Concurrently, based on EnMod2-12, we developed a user-friendly diagnostic platform to facilitate risk assessment and improve access to accurate diagnosis.

This study provides an efficient and accurate screening method for hematological diseases, especially in resource-limited countries and regions.

## Full-text entities

- **Diseases:** hematological disease (MESH:D006402)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12521225/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12521225/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/PMC12521225/full.md

---
Source: https://tomesphere.com/paper/PMC12521225