# Classification of knowledge of fertility period among adolescent girls in East Africa from 2012 to 2022: Machine learning algorithm

**Authors:** Andualem Addisu Birlie, Kassahun Dessie Gashu, Mulugeta Desalegn Kasaye, Ayana Alebachew Muluneh, Abdulaziz Kebede Kassaw, Hailemariam Kassahun Desalegn, Tamir Wondim Desta, Shimels Derso Kebede, Laura Sbaffi, Laura Sbaffi, Laura Sbaffi

PMC · DOI: 10.1371/journal.pdig.0001108 · PLOS Digital Health · 2026-02-23

## TL;DR

This study uses machine learning to classify how well adolescent girls in East Africa understand their fertility periods, finding that education and health communication are key factors.

## Contribution

The study introduces a machine learning approach to classify fertility knowledge and identifies key predictors using SHAP analysis.

## Key findings

- 13.22% of adolescent girls in East Africa had knowledge of their fertility period.
- Random forest achieved 91.12% AUC and 83.26% accuracy on balanced data, outperforming other models.
- Education level, health communication, and wealth index were top predictors of fertility knowledge.

## Abstract

Understanding the time of the menstrual cycle would help women to avoid getting pregnant without the need for surgical, hormonal, or mechanical contraception. Women who do not use contraception and do not know when they are fertile are at a higher risk (17%) of unplanned pregnancy and abortion. Classifying knowledge of fertility periods using machine learning algorithms would help to automate decision-making, produce more precise and accurate classification, and scale up to manage big and complex datasets. Therefore, this study aimed to classify knowledge of the fertility period among adolescent girls in East Africa from 2012 to 2022 using a machine-learning algorithm. A community-based cross-sectional study design was used from 12 East African countries’ DHS datasets spanning 2012–2022. The machine learning algorithms were applied to classify knowledge of the fertility period and identify its predictors using R software and Python, particularly Jupiter Notebook in Anaconda. Data cleaning, one-hot encoding, data splitting, data balancing, and ten-fold cross-validation were performed. Ten machine learning algorithms and SHAP were used to select and interpret the best model. From the 40,664 adolescent girls in East Africa, 13.22% (95% CI: 12.91, 13.54) of participants had knowledge of the fertility period. Logistic regression was found to be the best model for unbalanced training data with 74.38% of an AUC and 82.71% of an accuracy. While random forest outperformed on balanced training data, it achieved 91.12% of an AUC and 83.26% accuracy. The key determinant factors of the knowledge of the fertility period were education level, country, hearing about family planning, hearing about sexually transmitted infections, wealth index, knowledge of any method, and visiting health facilities. Governments, NGOs, policy makers, and researchers can utilize these findings to design targeted interventions for improving adolescents’ reproductive health based on the identified gaps and disparities.

The purpose of this study was to classify knowledge of the fertility period among adolescent girls in East Africa. Ten machine-learning models were trained, and the random forest model provided the most accurate prediction with an AUC score of 91.12% and an accuracy of 83.26%. Using the SHAP feature importance method, the top ten predictors were identified: education level, country, heard about family planning, heard about sexually transmitted infections, wealth index, knowledge of any method, and visited health facilities. Education level was a primary factor since girls with higher education levels were better prepared to understand reproductive health concepts. Similarly, increased exposure to hearing about family planning and general knowledge about STIs improved knowledge of fertility, again showing that a more extensive health communication strategy has contributed to girls’ foundational knowledge. It is important to note that the wealth index also has an effect on access to media, education, and health services. We saw that girls who demonstrated that they were aware of a contraceptive method or had ever visited a health facility were significantly more likely to understand their fertility period.

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** STI (MESH:D012749), stillbirths (MESH:D050497), unintended pregnancy (MESH:D011254), infertility (MESH:D007246), infections (MESH:D007239), DHS (OMIM:603663)
- **Chemicals:** PDIG-D-25-00447R2 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12928492/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12928492/full.md

## References

53 references — full list in the complete paper: https://tomesphere.com/paper/PMC12928492/full.md

---
Source: https://tomesphere.com/paper/PMC12928492