# A methodological showcase: utilizing minimal clinical parameters for early-stage mortality risk assessment in COVID-19-positive patients

**Authors:** Jonathan K. Yan

PMC · DOI: 10.7717/peerj-cs.2017 · 2024-04-30

## TL;DR

This study shows that using just six key clinical features can predict mortality in COVID-19 patients as accurately as using many more features.

## Contribution

A novel methodology is introduced to achieve high accuracy with minimal clinical parameters for mortality prediction in COVID-19.

## Key findings

- A model using six clinical features achieved 90-91% accuracy in predicting mortality.
- The six features included acute kidney injury, glucose level, age, troponin, oxygen level, and acute hepatic injury.
- Performance with six features was close to a model using 24 features (92% accuracy).

## Abstract

The scarcity of data is likely to have a negative effect on machine learning (ML). Yet, in the health sciences, data is diverse and can be costly to acquire. Therefore, it is critical to develop methods that can reach similar accuracy with minimal clinical features. This study explores a methodology that aims to build a model using minimal clinical parameters to reach comparable performance to a model trained with a more extensive list of parameters. To develop this methodology, a dataset of over 1,000 COVID-19-positive patients was used. A machine learning model was built with over 90% accuracy when combining 24 clinical parameters using Random Forest (RF) and logistic regression. Furthermore, to obtain minimal clinical parameters to predict the mortality of COVID-19 patients, the features were weighted using both Shapley values and RF feature importance to get the most important factors. The six most highly weighted features that could produce the highest performance metrics were combined for the final model. The accuracy of the final model, which used a combination of six features, is 90% with the random forest classifier and 91% with the logistic regression model. This performance is close to that of a model using 24 combined features (92%), suggesting that highly weighted minimal clinical parameters can be used to reach similar performance. The six clinical parameters identified here are acute kidney injury, glucose level, age, troponin, oxygen level, and acute hepatic injury. Among those parameters, acute kidney injury was the highest-weighted feature. Together, a methodology was developed using significantly minimal clinical parameters to reach performance metrics similar to a model trained with a large dataset, highlighting a novel approach to address the problems of clinical data collection for machine learning.

## Linked entities

- **Diseases:** COVID-19 (MONDO:0100096)

## Full-text entities

- **Diseases:** acute kidney injury (MESH:D058186), acute hepatic injury (MESH:D056486), COVID-19 (MESH:D000086382)
- **Chemicals:** oxygen (MESH:D010100), glucose (MESH:D005947)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11157615/full.md

---
Source: https://tomesphere.com/paper/PMC11157615