# Joint models in big data: simulation-based guidelines for required data quality in longitudinal electronic health records

**Authors:** Berit Hunsdieck, Christian Bender, Katja Ickstadt, Johanna Mielke

PMC · DOI: 10.1186/s13040-025-00450-z · BioData Mining · 2025-05-13

## TL;DR

This paper uses simulations to determine how data quality affects the performance of joint models in electronic health records compared to traditional survival models.

## Contribution

The study provides simulation-based guidelines for longitudinal EHR data quality needed for joint models to outperform Cox models.

## Key findings

- Biomarker changes before disease onset must be consistent within similar patient groups for joint models to perform well.
- Joint models outperform Cox regression with higher measurement density and increasing noise.
- The guidelines are illustrated using real-world examples of liver cirrhosis and chronic kidney disease.

## Abstract

Over the past decade an increase in usage of electronic health data (EHR) by office-based physicians and hospitals has been reported. However, these data types come with challenge regarding completeness and data quality and it is, especially for more complex models, unclear how these characteristics influence the performance.

In this paper, we focus on joint models which combines longitudinal modelling with survival modelling to incorporate all available information. The aim of this paper is to establish simulation-based guidelines for the necessary quality of longitudinal EHR data so that joint models perform better than cox models. We conducted an extensive simulation study by systematically and transparently varying different characteristics of data quality, e.g., measurement frequency, noise, and heterogeneity between patients. We apply the joint models and evaluate their performance relative to traditional Cox survival modelling techniques.

Key findings suggest that biomarker changes before disease onset must be consistent within similar patient groups. With increasing noise and a higher measurement density, the joint model surpasses the traditional Cox regression model in terms of model performance. We illustrate the usefulness and limitations of the guidelines with two real-world examples, namely the influence of serum bilirubin on primary biliary liver cirrhosis and the influence of the estimated glomerular filtration rate on chronic kidney disease.

## Linked entities

- **Diseases:** chronic kidney disease (MONDO:0005300)

## Full-text entities

- **Diseases:** chronic kidney disease (MESH:D051436), primary biliary liver cirrhosis (MESH:D008105)
- **Chemicals:** bilirubin (MESH:D001663)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12070788/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12070788/full.md

## References

12 references — full list in the complete paper: https://tomesphere.com/paper/PMC12070788/full.md

---
Source: https://tomesphere.com/paper/PMC12070788