# Development and internal validation of a machine learning–based prediction model for pulmonary hypertension in COPD

**Authors:** Ruoyu Wang, Jie Tan, Guangping Li, Zhenyu Pan, Huiling Guo, Wei Sun, Jing Wang

PMC · DOI: 10.3389/fmed.2026.1752113 · Frontiers in Medicine · 2026-02-18

## TL;DR

This study developed a machine learning model to predict pulmonary hypertension in COPD patients using noninvasive clinical data, improving early detection and management.

## Contribution

A novel CatBoost-based model with high accuracy and interpretable predictions for PH in COPD using routinely available variables.

## Key findings

- The CatBoost model achieved an AUC of 0.848 and high sensitivity and specificity for PH prediction.
- SHAP analysis revealed key predictors like right ventricular diameter and carbon dioxide levels as influential features.
- The model offers transparent explanations, aiding clinical decision-making for COPD patients at risk of PH.

## Abstract

Chronic obstructive pulmonary disease (COPD) is frequently complicated by pulmonary hypertension (PH), which worsens prognosis, but early PH detection is limited by the invasiveness or suboptimal sensitivity of current diagnostic tools.

In this retrospective study, we analyzed 523 hospitalized patients with COPD from Beijing Chaoyang Hospital. After standardized preprocessing and recursive feature elimination, 18 routinely available noninvasive clinical and physiological variables were retained as predictors. Eight machine-learning algorithms were trained to predict PH and compared using area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, F1 score, and decision-curve analysis; model interpretability was assessed with Shapley additive explanations (SHAP).

The CatBoost model showed the best discrimination (AUC 0.848; accuracy 0.830; sensitivity 0.758; specificity 0.866; F1 0.746). SHAP analysis identified right ventricular diameter, pulmonary artery diameter, arterial partial pressure of carbon dioxide, right atrial transverse diameter, and age as the most influential predictors.

A CatBoost-based prediction model using readily obtainable noninvasive variables can estimate PH risk in COPD with good accuracy and provide transparent feature-level explanations, potentially facilitating earlier detection and risk-stratified management.

## Linked entities

- **Diseases:** pulmonary hypertension (MONDO:0005149), chronic obstructive pulmonary disease (MONDO:0005002), COPD (MONDO:0005002)

## Full-text entities

- **Genes:** NPPB (natriuretic peptide B) [NCBI Gene 4879] {aka BNP, Iso-ANP}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** lung (MESH:D008171), endothelial dysfunction (MESH:D014652), malignancy (MESH:D009369), dyspnea (MESH:D004417), Lung tissue (MESH:D055370), hepatic or renal insufficiency (MESH:D048550), inflammation (MESH:D007249), PTE (MESH:D011655), Hypoxemia (MESH:D000860), GOLD 1-4 (MESH:C567520), blood vessels (MESH:D009383), hypoxic (MESH:D002534), RVD (MESH:D018487), Hypercapnia (MESH:D006935), Chronic Obstructive Lung Disease (MESH:D029424), cardiovascular or cerebrovascular disease (MESH:D002318), congenital heart disease (MESH:D006330), idiopathic pulmonary arterial hypertension (MESH:D065627), PAH (MESH:D000081029), death (MESH:D003643), blockage of airflow (MESH:D015508), GOLD 1 (MESH:C538557), PH (MESH:D006976), TD (MESH:D004409), RHC (MESH:D006333), systemic (MESH:D015619), cardiac remodeling (MESH:D020257), parenchymal disease (MESH:D017563)
- **Chemicals:** uric acid (MESH:D014527), PaCO2 (-), ROS (MESH:D017382), creatinine (MESH:D003404), Carbon dioxide (MESH:D002245)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12956692/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12956692/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/PMC12956692/full.md

---
Source: https://tomesphere.com/paper/PMC12956692