# Comparing the predictive discrimination of machine learning models for ordinal outcomes: A case study of dehydration prediction in patients with acute diarrhea

**Authors:** Kexin Qu, Monique Gainey, Samika S. Kanekar, Sabiha Nasrim, Eric J. Nelson, Stephanie C. Garbern, Mahmuda Monjory, Nur H. Alam, Adam C. Levine, Christopher H. Schmid

PMC · DOI: 10.1371/journal.pdig.0000820 · PLOS Digital Health · 2025-05-06

## TL;DR

This study compares machine learning and regression models for predicting dehydration severity in acute diarrhea patients, emphasizing the importance of external validation.

## Contribution

The paper introduces a proper evaluation framework for ordinal outcome models using multiple discrimination indices and external validation.

## Key findings

- Random forest (RF) showed high performance on training data but underperformed on external validation.
- Proportional Odds Logistic Regression (POLR) had the best performance on the test dataset and was the most efficient model.
- Internal validation overestimated model performance, highlighting the need for external validation in clinical prediction models.

## Abstract

Many comparisons of statistical regression and machine learning algorithms to build clinical predictive models use inadequate methods to build regression models and do not have proper independent test sets on which to externally validate the models. Proper comparisons for models of ordinal categorical outcomes do not exist. We set out to compare model discrimination for four regression and machine learning methods in a case study predicting the ordinal outcome of severe, some, or no dehydration among patients with acute diarrhea presenting to a large medical center in Bangladesh using data from the NIRUDAK study derivation and validation cohorts. Proportional Odds Logistic Regression (POLR), penalized ordinal regression (RIDGE), classification trees (CART), and random forest (RF) models were built to predict dehydration severity and compared using three ordinal discrimination indices: ordinal c-index (ORC), generalized c-index (GC), and average dichotomous c-index (ADC). Performance was evaluated on models developed on the training data, on the same models applied to an external test set and through internal validation with three bootstrap algorithms to correct for overoptimism. RF had superior discrimination on the original training data set, but its performance was more similar to the other three methods after internal validation using the bootstrap. Performance for all models was lower on the prospective test dataset, with particularly large reduction for RF and RIDGE. POLR had the best performance in the test dataset and was also most efficient, with the smallest final model size. Clinical prediction models for ordinal outcomes, just like those for binary and continuous outcomes, need to be prospectively validated on external test sets if possible because internal validation may give a too optimistic picture of model performance. Regression methods can perform as well as more automated machine learning methods if constructed with attention to potential nonlinear associations. Because regression models are often more interpretable clinically, their use should be encouraged.

## Linked entities

- **Diseases:** acute diarrhea (MONDO:0000257)

## Full-text entities

- **Diseases:** dehydration (MESH:D003681), acute diarrhea (MESH:D000208)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12054866/full.md

## Figures

23 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12054866/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/PMC12054866/full.md

---
Source: https://tomesphere.com/paper/PMC12054866