# Machine learning prediction of survival in centenarians after age 100: a retrospective, population-based cohort study

**Authors:** Jonathan K L Mak, Noel C Yue, Gloria Hoi-Yee Li, Jacqueline K Yuen, Tung Wai Auyeung, Kathryn Choon Beng Tan, Ching-Lung Cheung

PMC · DOI: 10.1093/gerona/glaf218 · The Journals of Gerontology Series A: Biological Sciences and Medical Sciences · 2025-10-09

## TL;DR

This study uses machine learning and health records to predict how long centenarians will live after age 100, finding moderate accuracy for short-term predictions.

## Contribution

The study demonstrates the feasibility of using machine learning on electronic health records to predict mortality in centenarians.

## Key findings

- The eXtreme Gradient Boosting model achieved AUROCs of 0.707 for 1-year and 0.704 for 2-year mortality prediction.
- Lower albumin, frequent hospitalizations, and higher urea levels were top predictors of mortality.
- ML models outperformed traditional comorbidity and frailty scores in predicting mortality among the oldest-old.

## Abstract

Whether survival at extreme ages can be accurately predicted remains unclear. This study explored the feasibility of using machine learning (ML) and electronic health records (EHRs) to predict mortality in centenarians and identify key survival determinants.

We analyzed 9718 centenarians (83% women) from the population-based EHR database in Hong Kong (2004-2018). Data were randomly split into 70% training and 30% testing cohorts. Using 82 predictors, including demographics, diagnoses, prescriptions, and laboratory results, we trained stepwise logistic regression and four ML algorithms to predict 1-year, 2-year, and 5-year all-cause mortality after age 100. Model performance was evaluated using discrimination (area under the receiver operating characteristic curve [AUROC]) and calibration metrics. In an independent cohort of 174 606 oldest-old adults aged 85-105 years, we further compared AUROCs of models incorporating the identified predictors versus comorbidity and frailty scores across different age groups.

Among the ML models, eXtreme Gradient Boosting algorithm provided the best performance, with AUROCs of 0.707 (95% CI = 0.685-0.730) for 1-year mortality and 0.704 (0.686-0.723) for 2-year mortality in the testing cohort. However, all models showed poor calibration for 5-year mortality. Top three predictors of mortality included lower albumin levels, more frequent hospitalizations, and higher urea levels. Models including these predictors consistently outperformed comorbidity and frailty for mortality prediction among oldest-old adults.

Utilizing ML models and routinely collected EHRs can predict short-term survival in centenarians with moderate accuracy. Further research is needed to determine whether mortality predictors differ across age in the oldest-old population.

## Full-text entities

- **Genes:** ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}
- **Diseases:** frailty (MESH:D000073496)
- **Chemicals:** urea (MESH:D014508)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12598932/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12598932/full.md

## References

64 references — full list in the complete paper: https://tomesphere.com/paper/PMC12598932/full.md

---
Source: https://tomesphere.com/paper/PMC12598932