# Privacy-preserving federated prediction of health outcomes using multi-center survey data

**Authors:** Supratim Das, Mahdie Rafiei, Paula T. Kammer, Søren T. Skou, Dorte T. Grønne, Ewa M. Roos, André Hajek, Hans-Helmut König, Md Shihab Ullah, Niklas Probul, Jan Baumbach, Linda Baumbach

PMC · DOI: 10.1186/s12874-026-02785-5 · BMC Medical Research Methodology · 2026-02-04

## TL;DR

This paper explores using privacy-preserving federated learning to build accurate health outcome prediction models from multi-center survey data without centralizing sensitive patient information.

## Contribution

The study demonstrates that federated learning can achieve comparable performance to centralized models while preserving data privacy in multi-center healthcare settings.

## Key findings

- Federated models outperformed local models in GLA:D® data with no significant difference compared to centralized models.
- In SHARE data, federated and centralized models both significantly outperformed local models.
- Federated learning achieves privacy-preserving model training with minimal performance loss.

## Abstract

Patient-reported survey data are used to train prognostic models aimed at improving healthcare. However, such data are typically available multi-centric and, for privacy reasons, cannot easily be centralized in one data repository. Models trained locally are less accurate, robust, and generalizable. We aim to investigate the applicability of privacy-preserving federated machine learning techniques for prognostic model building on health survey data, where local data never leaves the legally safe harbors of the medical centers.

We used centralized, local, and federated learning techniques on two healthcare datasets (GLA: D®data from the five health regions of Denmark and international SHARE data of 27 countries) to predict two different health outcomes. We compared linear regression, random forest regression, and random forest classification models trained on local data with those trained on the entire data in a centralized and in a federated fashion.

In GLA: D® data, federated linear regression (R2 0.34, RMSE 18.2) and federated random forest regression (R2 0.34, RMSE 18.3) models outperform their local counterparts (i.e., R2 0.32, RMSE 18.6, R2 0.30, RMSE 18.8) with statistical significance. We also found that centralized models (R2 0.34, RMSE 18.2, R2 0.32, RMSE 18.5, respectively) did not perform significantly better than the federated models. In SHARE, the federated model (AC 0.78, AUROC: 0.71) and centralized model (AC 0.84, AUROC: 0.66) perform significantly better than the local models (AC: 0.74, AUROC: 0.69).

Federated learning enables the training of prognostic models from multi-center surveys without compromising privacy and with only minimal or no compromise regarding model performance.

The online version contains supplementary material available at 10.1186/s12874-026-02785-5.

## Full-text entities

- **Diseases:** VAS pain (MESH:D010146), physical (MESH:D059445), Knee/hip pain (MESH:D046788), Osteoarthritis (MESH:D010003), SHARE (OMIM:603663), Knee Injury and Osteoarthritis (MESH:D020370), depressive symptom (MESH:D003866), brain tumor (MESH:D001932), movement limitations (MESH:D045745)
- **Chemicals:** DP (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12930927/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12930927/full.md

## References

15 references — full list in the complete paper: https://tomesphere.com/paper/PMC12930927/full.md

---
Source: https://tomesphere.com/paper/PMC12930927