# Integrative analysis and imputation of multiple data streams via deep Gaussian processes

**Authors:** Ali A Septiandri, Deyu Ming, Francisco Alejandro DiazDelaO, Takoua Jendoubi, Samiran Ray

PMC · DOI: 10.1093/bioadv/vbaf305 · 2025-11-27

## TL;DR

This paper introduces a new method using deep Gaussian processes to better handle missing data in critical care settings by capturing relationships between measurements and providing uncertainty estimates.

## Contribution

The novel contribution is applying deep Gaussian process emulation with stochastic imputation to critical care data for improved missing value handling.

## Key findings

- The proposed method outperforms conventional imputation techniques like MICE and last-known value imputation.
- The method effectively captures longitudinal and cross-sectional relationships in clinical data.
- It provides uncertainty estimates for imputed values, which is crucial for clinical decision-making.

## Abstract

Healthcare data, particularly in critical care settings, presents three key challenges for analysis. First, physiological measurements come from different sources but are inherently related. Yet, traditional methods often treat each measurement type independently, losing valuable information about their relationships. Second, clinical measurements are collected at irregular intervals, and these sampling times can carry clinical meaning. Finally, the prevalence of missing values. Whilst several imputation methods exist to tackle this common problem, they often fail to address the temporal nature of the data or provide estimates of uncertainty in their predictions.

We propose using deep Gaussian process emulation with stochastic imputation, a methodology initially conceived to deal with computationally expensive models and uncertainty quantification, to solve the problem of handling missing values that naturally occur in critical care data. This method leverages longitudinal and cross-sectional information and provides uncertainty estimation for the imputed values. Our evaluation of a clinical dataset shows that the proposed method performs better than conventional methods, such as multiple imputations with chained equations (MICE), last-known value imputation, and individually fitted Gaussian processes (GPs).

The source code of the experiments is freely available at: https://github.com/aliakbars/dgpsi-picu.

## Full-text entities

- **Genes:** ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}
- **Diseases:** hepatocellular carcinoma (MESH:D006528), acidosis (MESH:D000138), sleep disorder (MESH:D012893), alkalosis (MESH:D000471), COPD (MESH:D029424), asthma (MESH:D001249), LGP (MESH:D010335)
- **Chemicals:** DGP (-), CO2 (MESH:D002245), phosphate (MESH:D010710), Cl (MESH:D002713), Na (MESH:D012964), K (MESH:D011188), urea (MESH:D014508), lactate (MESH:D019344)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12776352/full.md

---
Source: https://tomesphere.com/paper/PMC12776352