XGBoost-Powered Digital Twins Leverage Routine Blood Tests for Early Detection of Cancer and Cardiovascular Disease
Lo Kai Shun John, Riya Nagar, Abicumaran Uthamacumaran, Hector Zenil

TL;DR
This study develops machine learning models called digital blood twins using routine blood tests and demographic data to enable scalable, low-cost early detection of cancer and cardiovascular diseases, demonstrating promising accuracy and interpretability.
Contribution
The paper introduces digital blood twins powered by XGBoost that leverage routine blood tests for scalable disease screening, a novel approach for accessible early detection.
Findings
Colorectal cancer prediction ROC-AUC up to 0.993
Cardiovascular disease models show ROC-AUC around 0.813
Models demonstrate potential for scalable, low-cost screening
Abstract
Early detection of cancer and cardiovascular diseases is fundamental to improving patient outcomes and reducing healthcare expenditure. Current cancer screening programs are targeted towards specific cancers and are often inaccessible to large parts of the population, particularly in remote regions. This project aimed to develop digital blood twins: machine learning models that leverage routinely collected blood test data, demographics, comorbidities, and prescribed medications, for scalable and cost-effective disease screening. Digital blood twins were constructed using the UK Biobank dataset (n = 373,269). Using age, sex, comorbidities, medication profiles, and blood test z-scores, three iterations of XGBoost classifiers were trained for broad cancer, colorectal cancer, and cardiovascular disease prediction. Model interpretability was achieved through SHAP and dimensionality reduction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroptosis and cancer prognosis · Cancer Genomics and Diagnostics · Artificial Intelligence in Healthcare and Education
