# Predicting Child Development Across Literacy, Physical, Learning, and Social‐Emotional Domains Using Supervised Machine Learning: A Cross‐Sectional Study Based on MICS 2019 Bangladesh

**Authors:** Faizul Islam, Golam Morshed Suhel, Mahmud Afroz, Md Aminul I. Apu, Md Jewel Rana, Tofajjel Hossain, Zaibunnesa Ziba, Md Fahim Shariar, Mohammad Nayeem Hasan

PMC · DOI: 10.1002/hsr2.71434 · Health Science Reports · 2025-11-04

## TL;DR

This study uses machine learning to predict child development in Bangladesh, identifying key factors like maternal education and socioeconomic status that influence outcomes.

## Contribution

Applies supervised machine learning to predict child development across multiple domains using MICS 2019 data in Bangladesh.

## Key findings

- Random Forest outperformed other models in predicting child development, especially in learning and physical domains.
- Maternal education, child age, regional location, and socioeconomic status were identified as key predictors of child development.
- Delays were highest in literacy-numeracy and social-emotional domains, indicating areas needing targeted interventions.

## Abstract

Early childhood development (ECD) plays a vital role in shaping a child's health and well‐being, influenced by child, family, and environmental factors. To prevent long‐term impairments, early detection and intervention are crucial. Using MICS 2019 data, this study applies supervised machine learning to predict ECD across four key domains and identify the most significant predictors and economic strategies.

In this study, using data of 9346 children obtained from Multiple Indicator Cluster Surveys (MICS) 2019, we evaluated and compared five classifiers: CART, Random Forest, XGBoost, Logistic Regression, and Support Vector Machines (SVM). We have addressed four early developmental domains as our target variables: literacy, numeracy, physics, learning, and social‐emotional development of children. Five‐fold cross‐validation was used to ensure appropriate test error rate estimations and reduce bias. To handle the data imbalance, the Synthetic Minority Oversampling Technique (SMOTE) is used.

The analysis shows that most children are developing normally in the learning (90.58%) and physical (98.70%) domains, while delays are highest in literacy‐numeracy (71.37%) and social‐emotional (27.57%) domains. Among the machine learning models evaluated, Random Forest consistently performed best across all domains, achieving the highest accuracy, particularly in learning (0.83) and physical (0.97) domains. Feature importance analysis identified maternal education, child age, regional location (Division), and socioeconomic status (Wealth Index) as key predictors. Early childhood education and books read at home also play important roles in cognitive and learning outcomes, guiding targeted interventions for child development.

The results show notable differences in early childhood development, particularly in social‐emotional and literacy‐numeracy domains. Socioeconomic status, early learning experiences, and parental education are key predictors, while physical and social‐emotional development are influenced by resources, regional factors, and nutrition. These findings can guide targeted interventions and policies for holistic child development.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12586350/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12586350/full.md

## References

57 references — full list in the complete paper: https://tomesphere.com/paper/PMC12586350/full.md

---
Source: https://tomesphere.com/paper/PMC12586350