Using Undersampling with Ensemble Learning to Identify Factors Contributing to Preterm Birth
Shi Dong, Zlatan Feric, Guangyu Li, Chieh Wu, April Z. Gu, Jennifer, Dy, John Meeker, Ingrid Y. Padilla, Jose Cordero, Carmen Velez Vega, Zaira, Rosario, Akram Alshawabkeh, David Kaeli

TL;DR
This paper develops ensemble learning models with novel missing data handling and undersampling techniques to identify key factors contributing to preterm birth, improving sensitivity significantly.
Contribution
It introduces new ensemble feature selection methods incorporating missing data and class imbalance handling, advancing analysis of preterm birth factors.
Findings
42% improvement in sensitivity over previous methods
Effective handling of missing data and class imbalance
Comparison of multiple ensemble feature selection techniques
Abstract
In this paper, we propose Ensemble Learning models to identify factors contributing to preterm birth. Our work leverages a rich dataset collected by a NIEHS P42 Center that is trying to identify the dominant factors responsible for the high rate of premature births in northern Puerto Rico. We investigate analytical models addressing two major challenges present in the dataset: 1) the significant amount of incomplete data in the dataset, and 2) class imbalance in the dataset. First, we leverage and compare two types of missing data imputation methods: 1) mean-based and 2) similarity-based, increasing the completeness of this dataset. Second, we propose a feature selection and evaluation model based on using undersampling with Ensemble Learning to address class imbalance present in the dataset. We leverage and compare multiple Ensemble Feature selection methods, including Complete Linear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPreterm Birth and Chorioamnionitis · Statistical Methods in Epidemiology · Neonatal and fetal brain pathology
