# National data meets AI: Machine learning for predicting overweight/obesity among ever-married Bangladeshi women

**Authors:** Suman Biswas, Md. Mahamudul Islam, Nusrat Islam, Md. Abdur Rahim Mia, Guanghui Liu, Guanghui Liu, Guanghui Liu

PMC · DOI: 10.1371/journal.pone.0341821 · PLOS One · 2026-02-02

## TL;DR

This study uses machine learning to predict overweight and obesity in Bangladeshi women, identifying key risk factors like age, wealth, and TV watching habits.

## Contribution

The study introduces an effective combination of balancing algorithms and feature selection for predicting obesity in a specific demographic.

## Key findings

- SVM achieved 95.79% accuracy in predicting overweight/obesity after data balancing and hyper-parameter tuning.
- Wealth status, age, and TV watching frequency were identified as the strongest predictors of overweight/obesity.
- The integration of SMOTE-ENN and feature selection improved classification performance for public health decision-making.

## Abstract

Overweight/obesity has become a critical global health issue, as these conditions are strongly associated with elevated risk of diabetes, stroke, cardiovascular disorders, and certain types of cancer. In recent decades, Bangladesh has faced a notable rise in overweight/obesity prevalence—women are more prone to obesity than men. This study presents a comprehensive strategy for identifying risk factors and predicting overweight and obesity through machine learning (ML) classifiers among ever-married Bangladeshi women aged 15–49 years. Data from the 2017–2018 BDHS, a nationally representative survey, were examined. The data were pre-processed and subsequently balanced using the synthetic minority over-sampling technique and edited nearest neighbors (SMOTE-ENN) approach. Various feature identification techniques, including Chi-Square, LASSO, and Sequential Forward Selection, were employed to determine the key risk features. Later, permutation feature importance and SHAP analysis were employed to assess the influence of these risk factors on overweight/obesity. The classification of overweight and obesity was conducted using seven machine learning models: Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), K-nearest Neighbors (KNN), eXtreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Multilayer Perceptron (MLP). Among the evaluated models, SVM performed best, reaching 95.79% accuracy and 97.32% precision when combined with SMOTE-ENN and hyper-parameter tuning. The study found that key factors contributing to being overweight/obese include age, division, type of residence, educational levels of both the respondent and her partner, number of children, frequency of television viewing, and wealth status; where wealth status, age, and frequency of watching television have strong influences. Therefore, integrating the balancing algorithm with the embedded feature selection strategy was effective in classifying overweight/obese women and could enhance decision-making for preventive measures in public health through timely predictions of overweight/obesity.

## Linked entities

- **Diseases:** diabetes (MONDO:0005015), stroke (MONDO:0005098), cancer (MONDO:0004992)
- **Species:** Homo sapiens (taxon 9606)

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** Overweight (MESH:D050177), cardiovascular disorders (MESH:D002318), diabetes (MESH:D003920), weight gain (MESH:D015430), fatty liver (MESH:D005234), heart attack (MESH:D009203), underweight (MESH:D013851), hypertension (MESH:D006973), stroke (MESH:D020521), osteoarthritis (MESH:D010003), Obesity (MESH:D009765), cancer (MESH:D009369), type 2 diabetes (MESH:D003924), dementia (MESH:D003704), sleep apnea (MESH:D012891)
- **Chemicals:** PONE-D-25-16485R1 (-)
- **Species:** gut metagenome (species) [taxon 749906], Homo sapiens (human, species) [taxon 9606]
- **Mutations:** AUC of 91, V743A, AUC at 93, V190A

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12863505/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12863505/full.md

## References

71 references — full list in the complete paper: https://tomesphere.com/paper/PMC12863505/full.md

---
Source: https://tomesphere.com/paper/PMC12863505