# Predicting health-related quality of life two years post-diagnosis across seven cancer types: using machine learning to identify vulnerable patients

**Authors:** Willemijn F. Oudijk, Belle H. de Rooij, Koen J. van Benthem, Rampal S. Etienne, Simone Oerlemans, Helena M. Verkooijen, Katja K. H. Aben, Geraldine R. Vink, Anne M. May, Floortje Mols, Dimitris Katsimpokis, Nicole P. M. Ezendam

PMC · DOI: 10.1007/s11136-026-04165-4 · Quality of Life Research · 2026-02-12

## TL;DR

This study uses machine learning to predict health-related quality of life in cancer survivors two years after diagnosis and identifies key factors that make some patients more vulnerable.

## Contribution

The novelty lies in applying multiple machine learning models to predict HRQoL and identifying consistent vulnerability factors across seven cancer types.

## Key findings

- All models achieved similar R2 (0.3) and RMSE (9) scores in predicting HRQoL.
- Lower functioning at diagnosis, comorbidities, cancer type (especially endometrial), and higher BMI were top vulnerability factors.
- Models overestimated low HRQoL, possibly due to limited low HRQoL observations in the data.

## Abstract

Cancer survivors often experience long-term consequences affecting their Health-Related Quality of Life (HRQoL). Sociodemographic factors, clinical characteristics, and health-related behaviours influence HRQoL, making some individuals vulnerable to adverse HRQoL. This study develops linear regression and machine learning models to predict HRQoL two-year post-diagnosis and to identify key vulnerability factors.

This longitudinal study included data of survivors of seven cancer types. Nineteen predictor variables were derived from questionnaires completed within three months post-diagnosis (baseline) from the Netherlands Cancer Registry. Linear regression, random forest, XGBoost, neural network, and Support Vector Machine (SVM) regressors were employed to predict the EORTC QLQ-C30 summary score 1.5–2.5 years post-diagnosis. Permutation testing assessed vulnerability factors.

The analyses included 4,538 individuals. All models achieved similar R2 (0.3) and RMSE (9) scores. Linear regression, random forest, XGBoost, and SVM models identified lower physical, cognitive, and emotional functioning at diagnosis, along with more comorbidities, cancer type (especially endometrial), and higher BMI as the top vulnerability factors. Treatment, age, and education were not associated with vulnerability. All models tended to overestimate low HRQoL which might be due to the limited number of observations with low HRQoL values.

The predictors used in this analysis explained only 30% of the variation in long-term HRQoL. Similar to previous studies predicting HRQoL in cancer, these predictors miss crucial information. Baseline functioning, comorbidities, cancer type and BMI appeared to be the key vulnerability factors. Future studies should prioritize accurate prediction of low HRQoL scores.

The online version contains supplementary material available at 10.1007/s11136-026-04165-4.

## Linked entities

- **Diseases:** cancer (MONDO:0004992), endometrial cancer (MONDO:0002447)

## Full-text entities

- **Diseases:** Cancer (MESH:D009369)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12901195/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12901195/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/PMC12901195/full.md

---
Source: https://tomesphere.com/paper/PMC12901195