# Applying machine learning to predict stunting in children under 5 years old based on water, sanitation and hygiene behaviors and infrastructure

**Authors:** Sanaya Sinharoy, Heather Reese, Thomas Clasen, Sheela S. Sinharoy, Ashish Khobragade, Ashish Khobragade, Ashish Khobragade, Ashish Khobragade, Ashish Khobragade

PMC · DOI: 10.1371/journal.pone.0343796 · 2026-03-05

## TL;DR

This study uses machine learning to predict childhood stunting based on water, sanitation, and hygiene factors in rural India, showing high accuracy with extreme gradient boosting.

## Contribution

The study introduces a novel application of extreme gradient boosting with feature engineering to predict stunting using WaSH data.

## Key findings

- Extreme gradient boosting with forward selection achieved 88% accuracy in predicting stunting.
- Four key WaSH factors were identified as strong predictors: improved sanitation, handwashing stations, piped water, and preferred drinking water sources.
- The model had an AUROC of 0.959, indicating strong predictive power.

## Abstract

Child stunting continues to pose a substantial global health challenge, requiring multifaceted strategies that combine conventional epidemiological approaches with advanced analytic methods. The aim of this study was to determine the most effective machine learning model for predicting stunting based on water, sanitation, and hygiene behaviors and infrastructure, with the goal of identifying high-risk children who would benefit most from targeted interventions.

This study was a secondary analysis of data from a matched cohort study assessing the effectiveness of combined on-premise piped water and improved sanitation for improved health outcomes in rural Odisha, India. Data for the parent study were collected from 2,398 households with a child under five years of age across 90 villages, and complete data were available for 1,196 children. Feature engineering techniques were employed to identify the most relevant predictors and utilized structural equation modeling, forward selection, backward elimination, and least absolute shrinkage and selection operator techniques. Five machine learning algorithms commonly used for binary classification tasks were compared: logistic regression, classification tree, support vector machine, neural network, and extreme gradient boosting.

Among 1,196 children analyzed, the extreme gradient boosting model with forward selection feature engineering best predicted stunting based on water, sanitation, and hygiene (WaSH) factors. It correctly identified 81% of stunted children and 92% of non-stunted children, with an overall accuracy of 88%. The model’s area under the receiver operating characteristic curve (AUROC) was 0.959 (95% CI: 0.949–0.968), indicating that WaSH factors strongly predict child stunting when analyzed using this advanced machine learning technique. Four WaSH factors were identified as having the strongest power to predict stunting in our sample: improved sanitation coverage, presence of a handwashing station, piped water coverage, and availability of preferred drinking water source.

The results demonstrate the efficacy of machine learning algorithms, especially extreme gradient boosting to potentially inform targeted WaSH interventions for reducing childhood stunting in resource-limited settings. However, these findings require external validation in other populations, and the complete-case analysis approach (excluding 35% of children with missing data) may limit generalizability to settings with less systematic data collection.

## Linked entities

- **Species:** Homo sapiens (taxon 9606)

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** enteric dysfunction (MESH:D004751), depression (MESH:D003866), fecal contamination (MESH:D005242), ML (MESH:D007859), SEM (MESH:D004195), infections (MESH:D007239), HAZ (MESH:C000719188), WaSH (MESH:D000069578), Stunting (MESH:D006130)
- **Chemicals:** water (MESH:D014867), PONE-D-24-38002R3 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12962480/full.md

---
Source: https://tomesphere.com/paper/PMC12962480