# Machine learning to examine adequate awareness and positive perception of HIV pre-exposure prophylaxis among women in sub-Saharan Africa: evidence from 2021-2024 surveys

**Authors:** Bewuketu Terefe, Abraham Keffale Mengistu, Andualem Enyew Gedefaw, Eliyas Addisu Taye, Fentahun Bikale Kebede, Jamilu Sani, Nebebe Demis Baykemagn, Tirualem Zeleke Yehuala, Amanuel Worku

PMC · DOI: 10.1186/s12879-025-12032-9 · BMC Infectious Diseases · 2025-11-14

## TL;DR

This study uses machine learning to understand why many women in sub-Saharan Africa lack awareness and positive views about HIV prevention through PrEP.

## Contribution

The study introduces machine learning to identify key factors influencing PrEP awareness and perception across multiple countries in sub-Saharan Africa.

## Key findings

- Only 14.9% of women in the study had adequate awareness and positive perception of PrEP.
- Younger age, lower education, limited media exposure, and minimal healthcare engagement were strongly linked to poor PrEP awareness.
- CatBoost was the most effective machine learning model for predicting PrEP awareness and perception.

## Abstract

Despite the proven efficacy of HIV pre-exposure prophylaxis (PrEP), adequate awareness and positive perception among women in sub-Saharan Africa (SSA) remain poorly understood, limiting uptake. Existing studies are largely country-specific, focus on limited socio-demographic factors, and rarely leverage advanced analytical methods to identify key determinants. This study addresses these gaps by applying machine learning to population-based surveys across multiple SSA countries.

We analyzed nationally representative surveys from eight SSA countries conducted between 2021 and 2024, including 123,132 HIV negative women aged 15–49 years. Primary outcomes were adequate awareness and positive perception of PrEP. Predictor variables included socio-demographic characteristics, behavioral factors, healthcare utilization, and contextual features. Data preprocessing included multiple imputation, one-hot encoding, and min–max scaling. Recursive feature elimination and correlation analysis guided feature selection. Five machine learning models—KNN, XGBoost, CatBoost, LightGBM, and Gradient Boosting—were trained and evaluated using accuracy, precision, recall, F1-score, and ROC AUC. SHAP values provided interpretable insights.

Only 14.9% of women demonstrated adequate awareness and positive perception of PrEP, with marked variation across countries (5.6% in Tanzania to 73.6% in Lesotho). Younger age (15–24 years), lower education, limited media exposure, and minimal healthcare engagement were strongly associated with inadequate awareness. CatBoost outperformed other models (accuracy 0.91, F1-score 0.88), followed by XGBoost (accuracy 0.89, F1-score 0.86). SHAP analysis confirmed age, education, media exposure, healthcare visits, and marital status as the most influential predictors.

Adequate awareness and positive perception of PrEP among women in SSA remains inadequate and unevenly distributed, highlighting urgent gaps in education and outreach. Machine learning effectively identifies key drivers, enabling targeted interventions to improve PrEP uptake across diverse socio-demographic contexts. These findings can inform country-specific PrEP awareness campaigns and policy strategies to enhance HIV prevention efforts.

Not applicable.

The online version contains supplementary material available at 10.1186/s12879-025-12032-9.

## Full-text entities

- **Species:** Human immunodeficiency virus 1 (no rank) [taxon 11676], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12619335/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12619335/full.md

## References

10 references — full list in the complete paper: https://tomesphere.com/paper/PMC12619335/full.md

---
Source: https://tomesphere.com/paper/PMC12619335