TwinWeaver: An LLM-Based Foundation Model Framework for Pan-Cancer Digital Twins
Nikita Makarov, Maria Bordukova, Lena Voith von Voithenberg, Estrella Pivel-Villanueva, Sabrina Mielke, Jonathan Wickes, Hanchen Wang, Mingyu Derek Ma, Keunwoo Choi, Kyunghyun Cho, Stephen Ra, Raul Rodriguez-Esteban, Fabian Schmich, Michael Menden

TL;DR
TwinWeaver is a novel framework that leverages large language models to improve forecasting and risk stratification in pan-cancer clinical data, enabling more accurate and interpretable digital twins for precision oncology.
Contribution
It introduces TwinWeaver, a unified LLM-based framework for modeling sparse, multi-modal clinical time series, and demonstrates its effectiveness in building Genie Digital Twin for cancer prognosis.
Findings
Significantly reduces forecasting error with median MASE of 0.87.
Improves risk stratification with average C-index of 0.703.
Generalizes well to out-of-distribution clinical trials, outperforming baselines.
Abstract
Precision oncology requires forecasting clinical events and trajectories, yet modeling sparse, multi-modal clinical time series remains a critical challenge. We introduce TwinWeaver, an open-source framework that serializes longitudinal patient histories into text, enabling unified event prediction as well as forecasting with large language models, and use it to build Genie Digital Twin (GDT) on 93,054 patients across 20 cancer types. In benchmarks, GDT significantly reduces forecasting error, achieving a median Mean Absolute Scaled Error (MASE) of 0.87 compared to 0.97 for the strongest time-series baseline (p<0.001). Furthermore, GDT improves risk stratification, achieving an average concordance index (C-index) of 0.703 across survival, progression, and therapy switching tasks, surpassing the best baseline of 0.662. GDT also generalizes to out-of-distribution clinical trials, matching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · Radiomics and Machine Learning in Medical Imaging
