# Antenatal prediction of small for gestational age at birth based on four birthweight standards using machine learning algorithms

**Authors:** Qiu-Yan Yu, Ying Lin, Yu-Run Zhou, Xin-Jun Yang, Joris Hemelaar

PMC · DOI: 10.3389/frai.2025.1679979 · Frontiers in Artificial Intelligence · 2026-01-12

## TL;DR

This study uses machine learning and traditional models to predict small for gestational age births in Chinese pregnancies using different weight standards.

## Contribution

The study compares machine learning and logistic regression models for predicting SGA using multiple birthweight standards in Chinese pregnancies.

## Key findings

- Late pregnancy models showed the best predictive power for SGA identification.
- Logistic regression and machine learning models had comparable performance in predicting SGA.
- Symphysis-fundal height and maternal characteristics were key predictors across standards.

## Abstract

Accurate antenatal prediction of SGA at birth is essential to improve development and delivery of preventative and therapeutic interventions. This study aimed to assess the performance of machine learning (ML) models to predict SGA at birth among Chinese pregnancies classified according to the Chinese birthweight standard and three international birthweight standards.

We collected multimodal, longitudinal, antenatal surveillance data on 350,135 singleton pregnancies in Wenzhou City, China, between Jan 1, 2014 and Dec 31, 2016. For three pregnancy intervals we developed ML prediction models for newborns classified as SGA using the China, Intergrowth 21st, Fetal Medicine Foundation (FMF), and Gestation-related Optimal Weight (GROW) standards. We applied lasso regression to conduct feature selection, and CatBoost, XGBoost, LightBoost, Artificial Neural Networks, Random Forest, Stacked ensemble model, and logistic regression for predictive modeling in training data sets, with validation in testing data sets.

Among 22,603 singleton pregnancies with complete data, the rate of SGA using the China standard was 6.1%, compared to 4.3, 6.0, and 9.7% for the Intergrowth 21st, GROW, and FMF standards, respectively. This pattern was maintained in the imputed data set (n = 225,523), with corresponding SGA rates of 6.8, 4.8, 7.4, and 10.7%. Late pregnancy models (<37 weeks) had the best power to predict SGA, compared to middle (<26 weeks) and early pregnancy (<18 weeks) models. With the China standard, the logistic regression model in late pregnancy performed best with an area under the receiver operating characteristic curve (ROC-AUC) of 0.74. Logistic regression also performed better than ML algorithms with the Intergrowth-21st and GROW standards at each pregnancy interval, although differences were small. The Random Forest model with the FMF standard achieved superior performance at each pregnancy interval, reaching a ROC-AUC of 0.79 in late pregnancy. Notably, the middle pregnancy Random Forest model with the FMF standard already attained a ROC-AUC of 0.72 at 26 weeks’ gestation. Symphysis-fundal height, maternal abdominal circumference, maternal age, maternal height and weight, and parity were consistently identified as key predictors of SGA across the different standards.

There are important differences in the classification of SGA at birth between national and international birthweight standards. Both machine learning models and traditional logistic regression demonstrated comparable predictive performance for SGA identification. These findings hold promise for guiding risk-stratified prenatal care and optimizing resource allocation in clinical settings.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12832878/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12832878/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/PMC12832878/full.md

---
Source: https://tomesphere.com/paper/PMC12832878