# Machine Learning Models for the Prediction of Preterm Birth at Mid-Gestation Using Individual Characteristics and Biophysical Markers: A Cohort Study

**Authors:** Antonios Siargkas, Ioannis Tsakiridis, Dimitra Kappou, Apostolos Mamopoulos, Ioannis Papastefanou, Themistoklis Dagklis

PMC · DOI: 10.3390/children12111451 · Children · 2025-10-25

## TL;DR

Machine learning models can better predict medically indicated preterm birth than spontaneous preterm birth using mid-pregnancy data, with simpler models performing as well as complex ones.

## Contribution

The study introduces subtype-specific models for iatrogenic and spontaneous preterm birth, showing that stratifying by subtype improves prediction accuracy more than using complex algorithms.

## Key findings

- Models for iatrogenic preterm birth had higher AUCs (up to 0.862 at <32 weeks) compared to spontaneous preterm birth (up to 0.749 at <32 weeks).
- Logistic Regression performed as well as complex machine learning algorithms like Random Forest and Neural Networks.
- Iatrogenic PTB predictions were driven by placental dysfunction markers, while spontaneous PTB was linked to cervical length and prior preterm birth history.

## Abstract

What are the main findings?
Predictive models demonstrated significantly stronger predictive performance for iatrogenic preterm birth (PTB) compared to spontaneous PTB across all gestational age thresholds. Traditional Logistic Regression performed comparably to more complex machine learning algorithms, indicating that predictor selection and subtype stratification are more critical for performance than algorithmic complexity.The predictive accuracy of the models consistently improved for earlier, more severe degrees of prematurity for both subtypes. For instance, the top AUC for predicting iatrogenic PTB increased from 0.764 at <37 weeks to 0.862 at <32 weeks.

Predictive models demonstrated significantly stronger predictive performance for iatrogenic preterm birth (PTB) compared to spontaneous PTB across all gestational age thresholds. Traditional Logistic Regression performed comparably to more complex machine learning algorithms, indicating that predictor selection and subtype stratification are more critical for performance than algorithmic complexity.

The predictive accuracy of the models consistently improved for earlier, more severe degrees of prematurity for both subtypes. For instance, the top AUC for predicting iatrogenic PTB increased from 0.764 at <37 weeks to 0.862 at <32 weeks.

What is the implication of the main finding?
The development of PTB subtype-specific models allows for a more personalized risk assessment than using single risk factors, which is crucial as management strategies differ substantially for spontaneous and iatrogenic PTB. A high predicted risk of spontaneous PTB might lead to progesterone therapy, while a high risk for iatrogenic PTB would prompt intensified surveillance for conditions like pre-eclampsia and fetal growth restriction.Accurate risk stratification enables the timely administration of interventions like antenatal corticosteroids and facilitates logistical planning for neonatal intensive care resources and potential transfers to specialized centers.

The development of PTB subtype-specific models allows for a more personalized risk assessment than using single risk factors, which is crucial as management strategies differ substantially for spontaneous and iatrogenic PTB. A high predicted risk of spontaneous PTB might lead to progesterone therapy, while a high risk for iatrogenic PTB would prompt intensified surveillance for conditions like pre-eclampsia and fetal growth restriction.

Accurate risk stratification enables the timely administration of interventions like antenatal corticosteroids and facilitates logistical planning for neonatal intensive care resources and potential transfers to specialized centers.

Background/Objectives: Preterm birth (PTB), defined as birth before 37 completed weeks of gestation, is a major global health challenge and a leading cause of neonatal mortality. PTB is broadly classified into spontaneous and medically indicated (iatrogenic), which have distinct etiologies. While prediction is key to improving outcomes, there is a lack of models that specifically differentiate between spontaneous and iatrogenic PTB subtypes. This study aimed to develop and validate predictive models for the prediction of spontaneous and iatrogenic PTB at <32, <34, and <37 weeks’ gestation using medical history and readily available second-trimester data. Methods: This was a retrospective cohort study on singleton pregnancies from a single tertiary institution (2012–2025). Predictor variables included maternal characteristics, obstetric history, and second-trimester ultrasound markers. Four algorithms, including multivariable Logistic Regression and three machine learning methods (Random Forest, XGBoost, and a Neural Network), were trained and evaluated on a held-out test set (20% of the data). Model performance was primarily assessed by the Area Under the Curve (AUC). Results: In total, 9805 singleton pregnancies were included. The models performed significantly better for iatrogenic PTB than for spontaneous PTB. For delivery <37 weeks, the highest AUC for iatrogenic PTB was 0.764 (Random Forest), while for spontaneous PTB it was 0.609 (Neural Network). Predictive accuracy improved for earlier gestations; for delivery <32 weeks, the best model for iatrogenic PTB achieved an AUC of 0.862 (Neural Network), and the best model for spontaneous PTB achieved an AUC of 0.749 (Random Forest). Model interpretation revealed that iatrogenic PTB was primarily driven by markers of placental dysfunction, such as estimated fetal weight by ultrasound scan and uterine artery pulsatility index, while spontaneous PTB was most associated with a history of PTB and a short cervical length. Conclusions: Models using routine mid-gestation data demonstrate effective prediction for iatrogenic PTB, with accuracy improving for earlier, more severe cases. In contrast, performance for spontaneous PTB was modest. Traditional Logistic Regression performed comparably to complex machine learning algorithms, highlighting that the clinical value is rooted in the subtype-specific modeling approach rather than in algorithmic complexity.

## Linked entities

- **Diseases:** pre-eclampsia (MONDO:0005081), fetal growth restriction (MONDO:0005030)

## Full-text entities

- **Diseases:** PTB (MESH:D047928), placental dysfunction (MESH:D010922)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12651481/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12651481/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/PMC12651481/full.md

---
Source: https://tomesphere.com/paper/PMC12651481