Feature Selection Approaches for Newborn Birthweight Prediction in Multiple Linear Regression Models
Esther Liu, Pei Xi Lin, Qianqi Wang, Karina Chen Feng

TL;DR
This study compares feature selection methods for multiple linear regression models to predict newborn birthweight using environmental and health data, aiming to identify the most predictive model with robust validation.
Contribution
It evaluates and compares forward, backward, and stepwise selection methods using multiple criteria and extensive cross-validation to optimize birthweight prediction models.
Findings
Optimal model identified using AIC, Adjusted R^2, and cross-validation.
Feature selection methods significantly impact model performance.
Restrictions on the model selection process are discussed.
Abstract
This project is based on the dataset "exposome_NA.RData", which contains a subcohort of 1301 mother-child pairs who were enrolled into the HELIX study during pregnancy. Several health outcomes were measured on the child at birth or at age 6-11 years, taking environmental exposures of interest and other covariates into account. This report outlines the process of obtaining the best MLR model with optimal predictive power. We first obtain three candidate models we obtained from the forward selection, backward elimination and stepwise selection, and select the optimal model using various comparison schemes including AIC, Adjusted R^2 and cross-validation for 8000 repetitions. The report ended with some additional findings revealed by the selected model, along with restrictions on the method we use in the model selection process.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Epidemiology · Air Quality Monitoring and Forecasting · Nutritional Studies and Diet
