# Limitations of Retrospective Machine Learning Models for Predicting Tracheostomy After Cardiac Surgery

**Authors:** Felix Wiesmueller, Johannes Rösch, Stephan Kersting, Thomas Strecker

PMC · DOI: 10.3390/diagnostics16050771 · 2026-03-04

## TL;DR

This study shows that models using past patient data are not reliable for predicting tracheostomy after heart surgery.

## Contribution

The study reveals that both existing and new machine learning models have poor accuracy when using retrospective data for tracheostomy prediction.

## Key findings

- The existing model had an AUC of 0.57, indicating poor discrimination.
- Newly developed machine learning models also showed poor diagnostic performance.
- Prospective data collection and physiological or imaging-based diagnostics may improve predictions.

## Abstract

Background/Objectives: Early tracheostomy seems favorable in prolonged ventilated patients after surgery. Hence, predicting tracheostomy after cardiac surgery is essential. Recently proposed prediction models aim to support this decision-making process, but their diagnostic validity across other patient populations remains uncertain. Methods: A retrospective single-center study was performed at a university hospital. The patient sample included consecutive patients between 2010 and 2020 who underwent cardiac surgery. Patients who underwent tracheostomy after cardiac surgery were assigned to the intervention group. Control group patients, who had not undergone tracheostomy, were randomly assigned to the group. An existing model was evaluated by receiver operating characteristics curve analysis. Four sets of risk features were chosen depending on results from regression analysis, lasso regularization, random forest or clinical domain knowledge. Newly developed models were created using machine learning methods: random forest, naïve Bayes, nearest neighbor and deep learning. Multiple models were trained with either feature set and then assessed using confusion matrices on an independent test set. Results: A total of 4744 patients were included in this study. One-hundred and eighteen patients were included in the tracheostomy group. Diagnostic accuracy of the existing model showed insufficient discrimination (area under the curve (AUC) = 0.57). Likewise, newly developed models also showed overall poor diagnostic discrimination across all feature sets and algorithms. Conclusions: This study shows the diagnostic limitations of retrospective clinical data for the diagnostic prediction of tracheostomy, thereby informing the design of future prospective diagnostic studies. Training new models should not rely on retrospective data alone. Instead, prospective data collection and integration of physiological or imaging-based diagnostics could likely contribute to the development of a good classifier.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12984356/full.md

---
Source: https://tomesphere.com/paper/PMC12984356