An Open Multi-Center Whole-Body FDG PET/CT Foundation Model for Tumor Segmentation
Xiaofeng Liu, Qianru Zhang, Thibault Marin, Menghua Xia, Chi Liu, Georges El Fakhri, Jinsong Ouyang

TL;DR
This paper introduces an open-source, multi-center FDG PET/CT foundation model that leverages hierarchical UNet architectures and masked autoencoding to improve tumor segmentation efficiency and cross-modality learning.
Contribution
It presents a novel multi-center foundation model with early cross-modal interaction and a masked autoencoding objective, enhancing label efficiency and generalization in PET/CT tumor segmentation.
Findings
Achieves comparable performance with only 10% labeled data.
Outperforms separated-modality pretraining in 5-shot linear probing.
Demonstrates robust cross-modality representation learning.
Abstract
The synergistic interpretation of anatomical information from computed tomography (CT) and metabolic information from positron emission tomography (PET) is important to oncologic imaging. However, existing deep learning methods for PET/CT remain largely task-specific, are often trained on single-center cohorts, or adopt dual-branch fusion schemes that delay cross-modal interaction and underutilize early spatial correspondence between PET and CT. To address these limitations, we present an open-source, multi-center, whole-body FDG PET/CT foundation model utilizing 4,997 harmonized scans from four public datasets. Our framework employs hierarchical UNet-shaped backbones with early channel-wise concatenation, enabling anatomical and metabolic features to interact from the first embedding layer onward. We further introduce a masked autoencoding objective based on zero-mean imputation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
