# Prediction of in-hospital death among patients admitted to a tertiary care hospital over the first 10 years: a machine learning approach

**Authors:** Edel Rafael Rodea-Montero, Brenda Jesús Rodríguez-Alcántar, Dagoberto Armenta-Medina

PMC · DOI: 10.3389/fpubh.2025.1635708 · Frontiers in Public Health · 2025-10-06

## TL;DR

This study uses machine learning to predict in-hospital death risk based on patient data at admission, aiming to improve hospital care.

## Contribution

The novel contribution is the application of XGBoost to predict in-hospital mortality with high accuracy using admission data in a tertiary care hospital.

## Key findings

- XGBoost outperformed logistic regression and random forest in predicting in-hospital death (AUC = 0.9162).
- Key predictive factors included medical service, number of conditions, and admission diagnosis according to ICD-10.
- The model achieved 87% sensitivity and 81.3% specificity in identifying high-risk patients.

## Abstract

To describe the pre- and post-admission characteristics of hospitalized patients in a tertiary care hospital and to adjust machine learning models capable of predicting and identifying the factors that are associated with and have a greater prognostic value for in-hospital death.

This was a retrospective study based on data from patients who were discharged from a Mexican tertiary care hospital during its first 10 years of operation (2007–2016). Preadmission characteristics were analyzed using descriptive statistics. Comparison tests (Mann–Whitney U) and association tests (chi-square) were applied according to the absence or presence of in-hospital death. Multivariate models (logistic regression, random forest and XGBoost) were fitted. Their ROC curves were compared using the DeLong test, and performance metrics were evaluated.

In total, 55,253 hospital discharges were considered, only 45,011 (0–101 years) had complete data, and the rate of in-hospital death was 4.17%. In total, 70% of the data were used for training and 30% for testing. Two-to-two comparisons between areas under the curve (AUCs) revealed that XGBoost (AUC = 0.9162) outperformed logistic regression (AUC = 0.9036) and random forest (AUC = 0.8978) (p-value < 0.001 in both cases). XGBoost had a sensitivity of 87%, specificity of 81.3% and balanced efficiency of 84.2%. The most relevant predictive factors were medical service that performed the admission, number of conditions, origin of the outpatient consultation of the hospital, and the main condition diagnosed at admission according to the ICD-10, age, month of admission, and day of the week of admission.

Owing to its ability to capture complex patterns, the XGBoost model makes it possible to identify patients with a relatively high risk of in-hospital death using the data available at hospital admission. This constitutes a support tool for decision-making, helping to determine which patients require closer monitoring and follow-up during their hospital stay to improve the quality of medical care.

## Full-text entities

- **Diseases:** death (MESH:D003643)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12536027/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12536027/full.md

## References

65 references — full list in the complete paper: https://tomesphere.com/paper/PMC12536027/full.md

---
Source: https://tomesphere.com/paper/PMC12536027