AI-Derived Reproductive Phenotypes and Explainable ML for Concurrent Early Multimorbidity in U.S. Women: NHANES 2017-March 2020
Sunday A. Adetunji

TL;DR
This study uses explainable machine learning on NHANES data to identify reproductive and socioeconomic factors associated with early multimorbidity in U.S. women aged 20-44, emphasizing interpretability for clinical relevance.
Contribution
It introduces a phenotyping approach combining PCA and clustering with ML models to improve understanding of reproductive health's link to multimorbidity, prioritizing interpretability.
Findings
XGBoost outperformed logistic regression in discrimination (ROC-AUC 0.766 vs 0.667)
Adverse reproductive factors are strongly clustered with early multimorbidity
Key drivers include age, PHQ-9 score, income, race, education, and reproductive index
Abstract
Background:Adverse reproductive history is a multisystemic risk factor, but evidence is constrained by isolated outcome studies, limited adjustment, and non-interpretable algorithmic models. We re-frame the estimand from prediction to concurrent risk classification and emphasize calibration, interpretability, and systematic error. Methods:We analyzed 1,602 U.S. women aged 20-44 years from NHANES 2017-March 2020 with reproductive-history variables, chronic-condition indicators, and PHQ-9 data. Restricted multimorbidity was defined as at least two of hypertension, hypercholesterolemia, cardiovascular disease, kidney disease, and kidney stones. Features were summarized using principal components analysis and k-means clustering. We compared multivariable logistic regression with XGBoost and used SHAP values to quantify contributions. Results:Early multimorbidity occurred in 6.6%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
