Multimodal Survival Modeling and Fairness-Aware Clinical Machine Learning for 5-Year Breast Cancer Risk Prediction
Toktam Khatibi

TL;DR
This paper develops a multimodal machine learning framework for 5-year breast cancer survival prediction, integrating clinical and high-dimensional molecular data, emphasizing calibration, fairness, robustness, and reproducibility.
Contribution
It introduces a comprehensive, reproducible multimodal survival modeling framework that combines clinical and molecular data with fairness and calibration considerations.
Findings
CoxNet achieved validation AUC of 98.3 and test AUC of 96.6.
XGBoost achieved validation AUC of 98.6 and test AUC of 92.5.
Fairness diagnostics showed stable discrimination across subgroups.
Abstract
Clinical risk prediction models often underperform in real-world settings due to poor calibration, limited transportability, and subgroup disparities. These challenges are amplified in high-dimensional multimodal cancer datasets characterized by complex feature interactions and a p >> n structure. We present a fully reproducible multimodal machine learning framework for 5-year overall survival prediction in breast cancer, integrating clinical variables with high-dimensional transcriptomic and copy-number alteration (CNA) features from the METABRIC cohort. After variance- and sparsity-based filtering and dimensionality reduction, models were trained using stratified train/validation/test splits with validation-based hyperparameter tuning. Two survival approaches were compared: an elastic-net regularized Cox model (CoxNet) and a gradient-boosted survival tree model implemented using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Breast Cancer Treatment Studies · Radiomics and Machine Learning in Medical Imaging
