# A Multicohort Machine Learning Framework to Predict Mortality in Elderly Patients With Heart Disease: Insights From HARLS, SHARE, and HRS

**Authors:** Zhiqiang Yang, Xiaohong Zhang

PMC · DOI: 10.1155/cdr/8040700 · Cardiovascular Therapeutics · 2026-01-02

## TL;DR

This study creates a machine learning model to predict mortality in elderly heart disease patients using data from three global cohorts, showing strong performance and insights into risk factors.

## Contribution

A multicohort machine learning framework with cross-cultural validation and interpretability for elderly heart disease mortality prediction.

## Key findings

- XGBoost achieved the highest average AUC of 0.798 across all datasets, showing strong generalizability.
- Age was the most influential predictor across all cohorts, with SHAP values ranging from 0.056 to 0.102.
- Feature dependence analysis revealed nonlinear relationships, such as a U-shaped association between grip strength and mortality risk.

## Abstract

Elderly patients with heart disease face elevated mortality risk, yet predictive models specifically tailored for this population across different global regions remain limited. Current mortality prediction tools often lack cross‐cultural validation and interpretability, hindering their clinical application in diverse healthcare settings.

We developed and validated machine learning models for predicting mortality in elderly heart disease patients using data from three major aging cohorts: the China Health and Retirement Longitudinal Study (CHARLS, n = 2130), the Survey of Health, Ageing and Retirement in Europe (SHARE, n = 10,928), and the Health and Retirement Study (HRS) from the United States (n = 4835). Boruta feature selection identified 27 common predictors across cohorts. Eleven machine learning algorithms were trained on the SHARE cohort (70% training and 30% testing) and externally validated on CHARLS and HRS cohorts. Model performance was assessed using area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and calibration metrics. SHapley Additive exPlanations (SHAP) analysis was employed to interpret model predictions.

XGBoost demonstrated superior performance with the highest average AUC (0.798) across all datasets, showing excellent generalizability from the SHARE training set (AUC: 0.805) to internal validation (AUC: 0.799) and external validation in HRS (AUC: 0.821) and CHARLS (AUC: 0.770) cohorts. Age consistently emerged as the most influential predictor across all cohorts (SHAP values: 0.056–0.102), followed by gender, moderate physical activity, and self‐rated health, though their relative importance varied by cohort. Feature dependence analysis revealed important nonlinear relationships, including U‐shaped associations between grip strength and mortality risk.

Our multicohort machine learning approach successfully developed a robust, interpretable model for predicting mortality in elderly heart disease patients across diverse global populations. The model′s strong performance in external validation demonstrates its potential for cross‐cultural clinical application, while SHAP analysis provides valuable insights into population‐specific risk factors that could guide targeted interventions.

## Linked entities

- **Diseases:** heart disease (MONDO:0005267)

## Full-text entities

- **Diseases:** Heart Disease (MESH:D006331), Mortality (MESH:D003643)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12759112/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12759112/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/PMC12759112/full.md

---
Source: https://tomesphere.com/paper/PMC12759112