# Development of a machine learning-based risk prediction model for early-stage pneumoconiosis: a retrospective study

**Authors:** Xin Jin, Xinghua Li, Zhaobo Guan, Hao Xu, Shaojie Li, Yaru Jiang, Lin Zhao, Wanping Wang, Zhenyu Li

PMC · DOI: 10.3389/fmed.2025.1730472 · Frontiers in Medicine · 2026-01-07

## TL;DR

This study develops a machine learning model using blood markers to predict early-stage pneumoconiosis, aiming to improve early diagnosis and intervention.

## Contribution

A novel machine learning-based risk prediction model for early pneumoconiosis using blood test indicators is developed and validated.

## Key findings

- Six blood markers (WBC, PDW, TB, ANC, ALT, AST) were identified as risk factors for pneumoconiosis.
- The SVM model outperformed other ML models in predicting pneumoconiosis risk.
- SHAP analysis provided insights into variable contributions for personalized risk assessment.

## Abstract

The diagnosis of occupational pneumoconiosis requires more accurate predictive models. The purpose of this study is to screen blood markers associated with early pneumoconiosis development from blood routine indicators in physical examination data, and to develop a highly sensitive and accurate clinical prediction model using machine learning (ML) algorithms to promote early diagnosis and timely intervention.

Data on age and various blood test results were collected from the results of the physical examination. Predictors were analyzed using the Least Absolute Contraction and Choice Operator (LASSO) and multiple logistic regression. A total of 9 ML models were evaluated in this study, including Logistic Regression (LR), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Random Forest (RF), Adaptive Boosting (AdBoost), Gaussian Naïve Bayes (GNB), Multilayer Perceptron (MLP), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN). We compared the performance of the models based on the following criteria: ROC, accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F1, the decision curve analysis (DCA), calibration curves, and precision-recall (PR) curves of the 9 models. Shapley Additive exPlanations (SHAP) interpretations are developed for personalized risk assessment.

In this study, 6 risk variables associated with the development of pneumoconiosis were identified, including White Blood Cell (WBC), Platelet Distribution Width (PDW), Total Bilirubin (TB), Absolute Neutrophil Count (ANC), Alanine Aminotransferase (ALT) and Aspartate Aminotransferase (AST). SVM was considered the optimal model and showed a good clinical applicability evaluation. SHAP analysis was employed to define the contributions of 6 variables to the progression of pneumoconiosis.

The indicators ultimately established as being associated with pneumoconiosis progression were WBC, PDW, TB, ANC, ALT and AST. The ML algorithm combined blood biochemical indicators to determine the risk factors associated with the occurrence of pneumoconiosis. The SVM model performs well and has the potential to improve early detection and diagnosis in clinical practice.

## Linked entities

- **Diseases:** pneumoconiosis (MONDO:0015926)

## Full-text entities

- **Genes:** SLC17A5 (solute carrier family 17 member 5) [NCBI Gene 26503] {aka AST, ISSD, NSD, SD, SIALIN, SIASD}, GPT (glutamic--pyruvic transaminase) [NCBI Gene 2875] {aka AAT1, ALT, ALT1, GPT1, SGPT}
- **Diseases:** pneumoconiosis (MESH:D011009)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12819255/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12819255/full.md

## References

112 references — full list in the complete paper: https://tomesphere.com/paper/PMC12819255/full.md

---
Source: https://tomesphere.com/paper/PMC12819255