Multimodal Cancer Modeling in the Age of Foundation Model Embeddings

Steven Song; Morgan Borjigin-Wang; Irene Madejski; Robert L. Grossman

arXiv:2505.07683·cs.LG·May 11, 2026

Multimodal Cancer Modeling in the Age of Foundation Model Embeddings

Steven Song, Morgan Borjigin-Wang, Irene Madejski, Robert L. Grossman

PDF

TL;DR

This paper explores using foundation model embeddings for multimodal cancer data, including genomics, imaging, and pathology reports, demonstrating improved predictive performance and the benefits of multimodal fusion and text summarization.

Contribution

It introduces an embedding-centric approach to multimodal cancer modeling that leverages foundation models for improved integration and analysis of diverse data types.

Findings

01

Multimodal fusion with foundation model embeddings outperforms unimodal models.

02

Including pathology report text enhances model performance.

03

Model-based text summarization and hallucination impact results significantly.

Abstract

The Cancer Genome Atlas (TCGA) has enabled novel discoveries and served as a large-scale reference dataset in cancer through its harmonized genomics, clinical, and imaging data. Numerous prior studies have developed bespoke deep learning models over TCGA for tasks such as cancer survival prediction. A modern paradigm in biomedical deep learning is the development of foundation models (FMs) to derive feature embeddings agnostic to a specific modeling task. Biomedical text especially has seen growing development of FMs. While TCGA contains free-text data as pathology reports, these have been historically underutilized. Here, we investigate the ability to train classical machine learning models over multimodal, zero-shot FM embeddings of cancer data. We demonstrate the ease and additive effect of multimodal fusion, outperforming unimodal models. Further, we show the benefit of including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.