# Data Analytics and Machine Learning Models on COVID-19 Medical Reports Enhanced with XAI for Usability

**Authors:** Oliver Lohaj, Ján Paralič, Zuzana Paraličová, Daniela Javorská, Elena Zagorová

PMC · DOI: 10.3390/diagnostics15151981 · Diagnostics · 2025-08-07

## TL;DR

This paper explores machine learning models to predict the severity and mortality risk of hospitalized COVID-19 patients, using explainable AI to improve usability and decision-making in healthcare.

## Contribution

The study introduces a web-based application with explainable AI (XAI) to enhance model usability and identifies key features affecting patient outcomes using SHAP values.

## Key findings

- LightGBM achieved 88.4% accuracy in predicting disease severity using all features.
- The LightGBM model for mortality risk reached a ROC AUC score of 83.7% and 81.2% accuracy.
- A simplified model with 15 features maintained high performance and was tested in a web application with medical experts.

## Abstract

Objective—To identify effective data analytics and machine learning solutions that can help in the decision-making process in the medical domain and contribute to the understanding of COVID-19 disease. In this study, we analyze data from anonymized electronic medical records of 4711 patients with COVID-19 disease admitted to hospital in Atlanta. Methods—We used random forest, LightGBM, XGBoost, CatBoost, KNN, SVM, logistic regression, and MLP neural network models in this work. The models are evaluated depending on the type of prediction by relevant metrics, especially accuracy, F1-score, and ROC AUC score. Another aim of the work was to find out which factors most affected severity and mortality risk among the patients. To identify the important features, different statistical methods were used, as well as LASSO regression, and explainable artificial intelligence (XAI) method SHAP values for model explainability. The best models were implemented in a web application and tested by medical experts. The model for prediction of mortality risk was tested on a validation cohort of 45 patients from the Department of Infectiology and Travel Medicine, L. Pasteur University Hospital in Košice (UNLP). Results—Our study shows that the best model for predicting COVID-19 disease severity was the LightGBM model with accuracy of 88.4% using all features and 89.5% using the eight most important features. The best model for predicting mortality risk was also the LightGBM model, which achieved a ROC AUC score of 83.7% and a classification accuracy of 81.2% using all features. Using a simplified model trained on the 15 most important features, the ROC AUC score was 83.6% and the classification accuracy was 80.5%. We deployed the simplified models for predicting COVID-19 disease severity and for predicting the risk of COVID-19-related death in a web-based application and tested them with medical experts. This test resulted in a ROC AUC score of 83.6% and an overall prediction accuracy of 73.3%.

## Linked entities

- **Diseases:** COVID-19 (MONDO:0100096)

## Full-text entities

- **Diseases:** death (MESH:D003643), COVID-19 (MESH:D000086382)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12346136/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12346136/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/PMC12346136/full.md

---
Source: https://tomesphere.com/paper/PMC12346136