# Predicting early infant diagnosis (EID) results for HIV exposed infants in a resource-limited setting using machine learning models: evidence from Amhara Public Health Institute data (2024/2025)

**Authors:** Zelalem Yitayal Melese, Mitiku Kassaw Takilo, Abraham Keffale Mengistu, Aynadis Worku Shimie, Gizaw Hailiye Teferi, Ashagrie Anteneh, Wubete Lule Ayalew, Sefefe Birhanu Tizie, Muluken Belachew Mengistie

PMC · DOI: 10.1186/s12879-025-12508-8 · BMC Infectious Diseases · 2026-01-06

## TL;DR

This paper explores using machine learning to predict HIV diagnosis in infants using data from a public health institute in a resource-limited setting.

## Contribution

The study evaluates and compares machine learning models for early HIV diagnosis in infants, emphasizing performance in imbalanced datasets.

## Key findings

- Gradient Boosting Model achieved the highest AUC of 99.99% for predicting HIV in infants.
- Support Vector Machine showed strong performance with an AUC of 96.58%.
- Random Forest had the lowest performance with an AUC of 90.86% due to imbalanced data handling limitations.

## Abstract

Early infant diagnosis of human immunodeficiency virus is essential for timely intervention and treatment of exposed infants. Traditional diagnostic approaches often face logistical and cost-related challenges, leading to delayed results and reduced healthcare efficiency. Machine learning provides a promising alternative, enabling earlier identification of at-risk infants and more efficient allocation of healthcare resources.

A cross-sectional study was conducted using early infant diagnosis data from the Amhara Public Health Institute, comprising 12,129 records with 12 features. Machine learning algorithms, including Decision Tree, Random Forest, Gradient Boosting Model, Logistic Regression, and Support Vector Machine, were trained and evaluated using Accuracy, Precision, Recall, F1-score, and Area Under the Curve (AUC). The Synthetic Minority Over-Sampling Technique was applied to address class imbalance.

The Gradient Boosting Model achieved the highest predictive performance with an AUC of 99.99%, followed by the Support Vector Machine with an AUC of 96.58%. Random Forest demonstrated the lowest performance with an AUC of 90.86%, highlighting its limitations in handling imbalanced datasets.

Ensemble-based models, particularly the Gradient Boosting Model and the Support Vector Machine, significantly enhance the accuracy of HIV early infant diagnosis predictions among exposed infants. These models are therefore recommended as reliable tools for reducing both missed diagnoses and false-positive results in clinical practice.

## Full-text entities

- **Species:** Human immunodeficiency virus 1 (no rank) [taxon 11676]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12870318/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12870318/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/PMC12870318/full.md

---
Source: https://tomesphere.com/paper/PMC12870318