# Disease classification via interpretable machine learning based on multi-center routine coagulation test

**Authors:** Feng Dong, Yaqiong Zhang, Weibu Chen, Changmin Wang, Lei Zhang, Xiaoling Gao, Xiaoli Zhang, Minghua Jiang, Guobin Xu, Ruichuang Yang, Yutong Hou, Jiandang Ma, Chuanbao Li, Jun Wu

PMC · DOI: 10.3389/fmolb.2026.1788536 · Frontiers in Molecular Biosciences · 2026-03-04

## TL;DR

This study uses machine learning on coagulation test data from multiple hospitals to classify diseases and identify key diagnostic features, improving clinical diagnosis.

## Contribution

The study introduces an interpretable machine learning model using multi-center coagulation data for disease classification.

## Key findings

- LightGBM achieved the best performance with high F1-scores and AUCs in cross and external validation.
- Key features like INR for valvular heart disease and age for pulmonary infection were identified using SHAP and Decision Tree analysis.
- The model showed strong generalization across different hospitals, supporting automated clinical diagnosis.

## Abstract

This study aims to establish an interpretable disease classification model via machine learning and identify key features related to the disease to assist clinical disease diagnosis based on a multi-center CX9000 routine coagulation test.

Data from 11 hospitals were collected. An unsupervised clustering model was used to extract classification patterns, and clinical experts assigned disease labels. Multiple machine learning models, including Random Forest, SVM, Decision Tree, Naive Bayes, MLP, XGBoost, and LightGBM, were trained. Ten-fold cross validation and external validation were performed. For external validation, models were trained with data from 8 hospitals (˜90%) and tested on the remaining 2 hospitals (˜10%). SHAP and Decision Tree analysis were used for interpretability.

Clear clustering patterns were observed for valvular heart disease (VHD) and pulmonary infection (PI). LightGBM achieved the best performance in both tasks. In cross validation, the mean F1-scores were 0.8890 and 0.7233, and the mean AUCs were 0.9500 and 0.8023. External validation showed strong generalization, with mean F1-scores of 0.9259 and 0.7464 and mean AUCs of 0.9493 and 0.8297. The sample visualization by t-SNE and the interpretable analysis by SHAP and Decision Trees identified some key classification features, i.e., international normalized ratio (INR) for VHD and age for PI.

Machine learning models based on multi-center coagulation tests provide effective and interpretable disease classification, supporting clinical diagnostic automation.

## Full-text entities

- **Diseases:** PI (MESH:D012141), VHD (MESH:D006349)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12996926/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12996926/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/PMC12996926/full.md

---
Source: https://tomesphere.com/paper/PMC12996926