# Effects of environment and globalization on the double and triple burdens of infection symptoms among under-five children across low-middle income countries using machine learning algorithms

**Authors:** Haile Mekonnen Fenta, A. Kofi Amegah, Aino K. Rantala, Inês Paciência, Jouni J. K. Jaakkola

PMC · DOI: 10.1186/s40249-025-01387-5 · 2025-11-20

## TL;DR

This study uses machine learning to analyze how environment and globalization affect infection symptoms in children under five in low- and middle-income countries.

## Contribution

The study introduces a novel integration of environmental and sociodemographic data with machine learning to predict childhood infection burdens.

## Key findings

- 11.9% of children under five had double or triple burdens of infection symptoms like fever, cough, and diarrhea.
- Random Forest machine learning achieved 94% and 99% accuracy in predicting these symptoms.
- Environmental factors and sociodemographic variables significantly influence infection symptom burdens.

## Abstract

Childhood infectious diseases and related symptoms, such as fever, cough, and diarrhea among children constitute the leading cause of death in low and middle-income countries (LMICs). We examined the environmental predictors of double and triple burden (D/TB) of infection symptoms among under-five children using multilevel machine learning (ML) methods.

We used Demographic and Health Surveys (DHS) data from 58 LMICs between 2000 and 2023. These data were merged with cluster-level particulate matter and nitrogen dioxide from the National Aeronautics and Space Administration and country-level data on political, social, and economic globalization from the World Bank report. We applied multilevel models to screen out the most important predictors of D/TB symptoms and applied machine learning algorithms to predict these symptoms among children across LMICs. We trained and validated ML algorithms on (80, 70, and 60%) of the data and tested on the remaining (20, 30, and 40%) with 2, 5 and 10 cross-validations.

Of 1,546,243 children, 19.2%, 20.5% and 12.6% had fever, cough, and diarrhea, respectively; while the overall D/TB prevalence was 11.9% and 3.7%, respectively. The result revealed D/TB were associated with the location of a child, survey years, wealth index, family size, air pollutants, and environmental covariates. The estimated prevalence of both D/TB symptoms substantially varies across districts [intraclass correlation (intraclass correlation, ICC = 13.3%)] and countries (ICC = 8.8%). We found that the Random Forest gave the maximum Area Under the Curve of 94% and 99% for D/TBs for the K10 protocol and 80:20 training and testing dataset splits.

The study found substantial variation in the prevalences of D/TB of illness among children under five and identified several environmental and sociodemographic predictors of these health outcomes. The Random Forest algorithm performed best in predicting these burdens. The study emphasized how integrating environmental and sociodemographic data with machine learning can enhance targeted interventions to reduce childhood infectious disease burdens in low- and middle-income countries.

The online version contains supplementary material available at 10.1186/s40249-025-01387-5.

## Full-text entities

- **Diseases:** death (MESH:D003643), diarrhea (MESH:D003967), D (MESH:D014808), infection (MESH:D007239), cough (MESH:D003371), D/TB of (MESH:C536008), TB (MESH:D014390), infectious disease (MESH:D003141), fever (MESH:D005334)
- **Chemicals:** nitrogen dioxide (MESH:D009585)

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12632089/full.md

---
Source: https://tomesphere.com/paper/PMC12632089