# Big data analysis for Covid-19 in hospital information systems

**Authors:** Xinpa Ying, Haiyang Peng, Jun Xie

PMC · DOI: 10.1371/journal.pone.0294481 · PLOS ONE · 2024-05-22

## TL;DR

This paper introduces a deep learning framework to improve the accuracy of identifying COVID-19 from CT images across different hospital systems.

## Contribution

A novel deep learning framework that handles domain discrepancies in multi-site datasets for improved COVID-19 identification.

## Key findings

- The proposed method outperforms the original model by 13.27% and 15.15% in AUC on two public datasets.
- The framework improves prediction accuracy and learning efficiency through redesigned architecture and normalization techniques.
- Contrastive training enhances domain invariance and classification performance across datasets.

## Abstract

The COVID-19 pandemic has triggered a global public health crisis, affecting hundreds of countries. With the increasing number of infected cases, developing automated COVID-19 identification tools based on CT images can effectively assist clinical diagnosis and reduce the tedious workload of image interpretation. To expand the dataset for machine learning methods, it is necessary to aggregate cases from different medical systems to learn robust and generalizable models. This paper proposes a novel deep learning joint framework that can effectively handle heterogeneous datasets with distribution discrepancies for accurate COVID-19 identification. We address the cross-site domain shift by redesigning the COVID-Net’s network architecture and learning strategy, and independent feature normalization in latent space to improve prediction accuracy and learning efficiency. Additionally, we propose using a contrastive training objective to enhance the domain invariance of semantic embeddings and boost classification performance on each dataset. We develop and evaluate our method with two large-scale public COVID-19 diagnosis datasets containing CT images. Extensive experiments show that our method consistently improves the performance both datasets, outperforming the original COVID-Net trained on each dataset by 13.27% and 15.15% in AUC respectively, also exceeding existing state-of-the-art multi-site learning methods.

## Linked entities

- **Diseases:** COVID-19 (MONDO:0100096)

## Full-text entities

- **Diseases:** COVID-19 (MESH:D000086382), infected (MESH:D007239)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11111070/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11111070/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC11111070/full.md

---
Source: https://tomesphere.com/paper/PMC11111070