# Admission blood tests predicting survival of SARS-CoV-2 infected patients: a practical implementation of graph convolution network in imbalance dataset

**Authors:** Jie Lian, Fan Huang, Xinhai Huang, Kitty Yu-Yeung Lau, Kei Shing Ng, Carlin Chun Fai Chu, Simon Ching Lam, Mohamad Koohli-Moghadam, Varut Vardhanabhuti

PMC · DOI: 10.1186/s12879-024-09699-x · BMC Infectious Diseases · 2024-08-09

## TL;DR

This study uses graph convolution networks to predict survival of COVID-19 patients from blood tests, improving accuracy in imbalanced datasets.

## Contribution

The novel use of graph convolutional networks addresses class imbalance in predicting survival of SARS-CoV-2 patients.

## Key findings

- The GCN model achieved an AUC of 0.944, significantly outperforming other models in predicting survival outcomes.
- The model demonstrated good discriminability between low- and high-risk patients using Kaplan-Meier estimates.
- The GCN model showed inadequate separation between false negative and true negative groups in subanalysis.

## Abstract

Predicting an individual’s risk of death from COVID-19 is essential for planning and optimising resources. However, since the real-world mortality rate is relatively low, particularly in places like Hong Kong, this makes building an accurate prediction model difficult due to the imbalanced nature of the dataset. This study introduces an innovative application of graph convolutional networks (GCNs) to predict COVID-19 patient survival using a highly imbalanced dataset. Unlike traditional models, GCNs leverage structural relationships within the data, enhancing predictive accuracy and robustness. By integrating demographic and laboratory data into a GCN framework, our approach addresses class imbalance and demonstrates significant improvements in prediction accuracy.

The cohort included all consecutive positive COVID-19 patients fulfilling study criteria admitted to 42 public hospitals in Hong Kong between January 23 and December 31, 2020 (n = 7,606). We proposed the population-based graph convolutional neural network (GCN) model which took blood test results, age and sex as inputs to predict the survival outcomes. Furthermore, we compared our proposed model to the Cox Proportional Hazard (CPH) model, conventional machine learning models, and oversampling machine learning models. Additionally, a subgroup analysis was performed on the test set in order to acquire a deeper understanding of the relationship between each patient node and its neighbours, revealing possible underlying causes of the inaccurate predictions.

The GCN model was the top-performing model, with an AUC of 0.944, considerably outperforming all other models (p < 0.05), including the oversampled CPH model (0.708), linear regression (0.877), Linear Discriminant Analysis (0.860), K-nearest neighbours (0.834), Gaussian predictor (0.745) and support vector machine (0.847). With Kaplan-Meier estimates, the GCN model demonstrated good discriminability between low- and high-risk individuals (p < 0.0001). Based on subanalysis using the weighted-in score, although the GCN model was able to discriminate well between different predicted groups, the separation was inadequate between false negative (FN) and true negative (TN) groups.

The GCN model considerably outperformed all other machine learning methods and baseline CPH models. Thus, when applied to this imbalanced COVID survival dataset, adopting a population graph representation may be an approach to achieving good prediction.

The online version contains supplementary material available at 10.1186/s12879-024-09699-x.

## Linked entities

- **Diseases:** SARS-CoV-2 (MONDO:0100096), COVID-19 (MONDO:0100096)

## Full-text entities

- **Diseases:** death (MESH:D003643), COVID (MESH:D000086382)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11313168/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11313168/full.md

## References

8 references — full list in the complete paper: https://tomesphere.com/paper/PMC11313168/full.md

---
Source: https://tomesphere.com/paper/PMC11313168