# Comparative performance of twelve machine learning models in predicting COVID-19 mortality risk in children: a population-based retrospective cohort study in Brazil

**Authors:** Adriano Lages dos Santos, Maria Christina L. Oliveira, Enrico A. Colosimo, Robert H. Mak, Clara C. Pinhati, Stella C. Gallante, Hercílio Martelli-Júnior, Ana Cristina Simões e Silva, Eduardo A. Oliveira

PMC · DOI: 10.7717/peerj-cs.2916 · PeerJ Computer Science · 2025-05-28

## TL;DR

This study compares 12 machine learning models to predict mortality risk in children hospitalized with COVID-19 in Brazil, finding logistic regression as the most accurate.

## Contribution

The study evaluates multiple ML models for predicting pediatric mortality in Brazil, identifying key risk factors and top-performing algorithms.

## Key findings

- Logistic Regression achieved 92.5% accuracy and 80.1% AUC in predicting mortality.
- Gradient Boosting and AdaBoost followed closely with similar performance.
- Reduced oxygen saturation, comorbidities, and older age were key predictors of mortality.

## Abstract

The COVID-19 pandemic has catalyzed the application of advanced digital technologies such as artificial intelligence (AI) to predict mortality in adult patients. However, the development of machine learning (ML) models for predicting outcomes in children and adolescents with COVID-19 remains limited. This study aimed to evaluate the performance of multiple machine learning models in forecasting mortality among hospitalized pediatric COVID-19 patients. In this cohort study, we used the SIVEP-Gripe dataset, a public resource maintained by the Ministry of Health, to track severe acute respiratory syndrome (SARS) in Brazil. To create subsets for training and testing the machine learning (ML) models, we divided the primary dataset into three parts. Using these subsets, we developed and trained 12 ML algorithms to predict the outcomes. We assessed the performance of these models using various metrics such as accuracy, precision, sensitivity, recall, and area under the receiver operating characteristic curve (AUC).

Among the 37 variables examined, 24 were found to be potential indicators of mortality, as determined by the chi-square test of independence. The Logistic Regression (LR) algorithm achieved the highest performance, with an accuracy of 92.5% and an AUC of 80.1%, on the optimized dataset. Gradient boosting classifier (GBC) and AdaBoost (ADA), closely followed the LR algorithm, producing similar results. Our study also revealed that baseline reduced oxygen saturation, presence of comorbidities, and older age were the most relevant factors in predicting mortality in children and adolescents hospitalized with SARS-CoV-2 infection. The use of ML models can be an asset in making clinical decisions and implementing evidence-based patient management strategies, which can enhance patient outcomes and overall quality of medical care. LR, GBC, and ADA models have demonstrated efficiency in accurately predicting mortality in COVID-19 pediatric patients.

## Linked entities

- **Diseases:** COVID-19 (MONDO:0100096), SARS (MONDO:0005091)

## Full-text entities

- **Diseases:** SARS (MESH:D045169), COVID-19 (MESH:D000086382)
- **Chemicals:** oxygen (MESH:D010100)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12192853/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12192853/full.md

## References

97 references — full list in the complete paper: https://tomesphere.com/paper/PMC12192853/full.md

---
Source: https://tomesphere.com/paper/PMC12192853