# Leveraging machine learning algorithm to predict minimum dietary diversity among children aged 6–23 months in Ethiopia

**Authors:** Naol Gonfa Serbessa, Siraj Muhidin Degefa, Beriso Alemu Hailu, Geleta Nenko Dube, Betelhem Bizuneh Asfawu, Asmamaw Ketemaw Tsehay, Eskedar Ayehu, Mulusew Andualem Asemahegn, Agmasie Damtew Wale, Eden Ketema Woldekidan, Tigist Tolessa Sedi, Asmamaw Deneke, Zehara Jemal Nuriye, Mohammedjud Hassen Ahmed, Habtamu Alganeh Guadie

PMC · DOI: 10.1371/journal.pgph.0005995 · PLOS Global Public Health · 2026-02-26

## TL;DR

This study uses machine learning to predict dietary diversity in young Ethiopian children, identifying key factors like household and regional differences.

## Contribution

The study introduces a novel application of machine learning algorithms to predict dietary diversity in children using Ethiopian demographic data.

## Key findings

- Random forest achieved the highest performance (82% accuracy) in predicting minimum dietary diversity.
- Key predictors include place of delivery, household characteristics, and regional factors.
- Machine learning effectively identifies at-risk populations for targeted nutrition interventions.

## Abstract

Lack of nutrient-rich food consumption is considered an important underlying factor affecting the healthy development of children, and can lead to developmental delays and various disorders. There is limited evidence on the predicators of dietary diversity. We aimed to train and test eight machine learning algorithms in the Ethiopian demographic and health survey (EDHS) from 2005–2019. We used secondary data from EDHS 2005, 2011, 2016 and 2019. A total of 8,996 weighted samples of children aged 6–23 months were included in the study. STATA 17 was used to extract variables from the EDHS dataset. Python 3.11 software was used for data cleaning, coding, and further analysis. The machine learning algorithms used in this study were logistic regression, random forest, K nearest neighbor (KNN), multilayer perceptron (MLP), support vector machine, naive Bayes, extreme gradient Boost (XGBoost), and AdaBoost. Furthermore, Shapley additive explanation’s (SHAPs) were used for model interpretability and to identify top predictors. The random forest classifier (accuracy = 82%, recall = 84.9%, precision = 78.5%, F1-score = 81.7%, area under the curve: AUC = 89%) was the best model for predicting minimum dietary diversity among children aged 6–23 month. Minimum Dietary Diversity is still a significant public health issue in Ethiopia, and there are important inequalities in regional and socioeconomic factors. The random forest model performed better for prediction and found place of delivery, sex of the household head, water source, place of residence, age of the child, number of children under five years of age, women’s years of age, and household size as the most important predictors. The result shows the importance of the use of machine learning in detecting the most-at-risk population and informing specific nutrition interventions.

## Full-text entities

- **Diseases:** developmental delays (MESH:D002658)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13030623/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13030623/full.md

## References

74 references — full list in the complete paper: https://tomesphere.com/paper/PMC13030623/full.md

---
Source: https://tomesphere.com/paper/PMC13030623