# Detection of Severe Lung Infection on Chest Radiographs of COVID-19 Patients: Robustness of AI Models across Multi-Institutional Data

**Authors:** André Sobiecki, Lubomir M. Hadjiiski, Heang-Ping Chan, Ravi K. Samala, Chuan Zhou, Jadranka Stojanovska, Prachi P. Agarwal

PMC · DOI: 10.3390/diagnostics14030341 · Diagnostics · 2024-02-05

## TL;DR

This paper presents AI models that can distinguish between severe and non-severe lung infections in COVID-19 patients using chest X-rays, showing promising performance across multiple datasets.

## Contribution

The novel contribution is the development and evaluation of robust deep learning models for classifying severe vs. non-severe lung infection in multi-institutional and multi-country CXR data.

## Key findings

- Inception-v4 models achieved higher AUC (0.85–0.89) compared to Inception-v1 models (0.81–0.84) on independent test sets.
- The models demonstrated reproducibility and generalizability across different training and validation dataset combinations.
- The AI models show promise in differentiating severe from non-severe lung infection in COVID-19 patients.

## Abstract

The diagnosis of severe COVID-19 lung infection is important because it carries a higher risk for the patient and requires prompt treatment with oxygen therapy and hospitalization while those with less severe lung infection often stay on observation. Also, severe infections are more likely to have long-standing residual changes in their lungs and may need follow-up imaging. We have developed deep learning neural network models for classifying severe vs. non-severe lung infections in COVID-19 patients on chest radiographs (CXR). A deep learning U-Net model was developed to segment the lungs. Inception-v1 and Inception-v4 models were trained for the classification of severe vs. non-severe COVID-19 infection. Four CXR datasets from multi-country and multi-institutional sources were used to develop and evaluate the models. The combined dataset consisted of 5748 cases and 6193 CXR images with physicians’ severity ratings as reference standard. The area under the receiver operating characteristic curve (AUC) was used to evaluate model performance. We studied the reproducibility of classification performance using the different combinations of training and validation data sets. We also evaluated the generalizability of the trained deep learning models using both independent internal and external test sets. The Inception-v1 based models achieved AUC ranging between 0.81 ± 0.02 and 0.84 ± 0.0, while the Inception-v4 models achieved AUC in the range of 0.85 ± 0.06 and 0.89 ± 0.01, on the independent test sets, respectively. These results demonstrate the promise of using deep learning models in differentiating COVID-19 patients with severe from non-severe lung infection on chest radiographs.

## Linked entities

- **Diseases:** COVID-19 (MONDO:0100096)

## Full-text entities

- **Diseases:** Lung Infection (MESH:D012141), infections (MESH:D007239), COVID-19 (MESH:D000086382)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC10855789/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC10855789/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/PMC10855789/full.md

---
Source: https://tomesphere.com/paper/PMC10855789