# Interpretable and Performant Multimodal Nasopharyngeal Carcinoma GTV Segmentation with Clinical Priors Guided 3D-Gaussian-Prompted Diffusion Model (3DGS-PDM)

**Authors:** Jiarui Zhu, Zongrui Ma, Ge Ren, Jing Cai

PMC · DOI: 10.3390/cancers17223660 · Cancers · 2025-11-14

## TL;DR

This paper introduces a new 3D-Gaussian-prompted diffusion model for accurate and interpretable tumor segmentation in nasopharyngeal carcinoma using multimodal imaging.

## Contribution

The novel use of 3D Gaussian representations and a Gaussian-prompted diffusion model for interpretable multimodal medical image segmentation.

## Key findings

- The proposed model achieves a DSC of 84.29 for primary tumor segmentation and 79.25 for metastasis tumor segmentation.
- The method outperforms five state-of-the-art models in multimodal segmentation accuracy and interpretability.
- The model's step-wise diffusion process allows traceable integration of clinical priors from multimodal imaging inputs.

## Abstract

This is the first study to utilize 3D Gaussian representations and a Gaussian-prompt diffusion model for performant and interpretable multimodal medical imaging segmentation. Our proposed 3D Gaussian-prompted diffusion model addresses two long-standing challenges in this area: (1) accuracy limitation caused by heavy information redundancy and (2) intepretability defectiveness caused by unreliable information extraction and integration from multimodal inputs. Exclusive experiments have demonstrated that our proposed method can not only evidently boost the multimodal segmentation performance of gross tumor volume for nasopharyngeal carcinoma but also undertake segmentation in an interpretable step-wise diffusion process with traceable contribution from prior guidance on multimodal imaging inputs.

Background: Gross tumor volume (GTV) segmentation of Nasopharyngeal Carcinoma (NPC) crucially determines the precision of image-guided radiation therapy (IGRT) for NPC. Compared to other cancers, the clinical delineation of NPC is especially challenging due to its capricious infiltration of the adjacent rich tissues and bones, and it routinely requires multimodal information from CT and MRI series to identify its ambiguous tumor boundary. However, the conventional deep learning-based multimodal segmentation method suffers from limited prediction accuracy and frequently performs as well as or worse than single-modality segmentation models. The limited multimodal prediction performance indicates defective information extraction and integration from the input channels. This study aims to develop a 3D Gaussian-prompted Diffusion Model (3DG-PDM) for more clinically targeted information extraction and effective multimodal information integration, thereby facilitating more accurate and clinically interpretable GTV segmentation for NPC. Methods: We propose a 3D-Gaussian-Prompted Diffusion Model (3DGS-PDM) that operates NPC tumor contouring in multimodal clinical priors through a guided stepwise process. The proposed model contains two modules: a Gaussian Initialization Module that utilizes a 3D-Gaussian-Splatting technique to distill 3D-Gaussian representations based on clinical priors from CT, MRI-t2 and MRI-t1-contract-enhanced-fat-suppression (MRI-t1-cefs), respectively, and a Diffusion Segmentation Module that generates tumor segmentation step-by-step from the fused 3D-Gaussians prompts. We retrospectively collected data on 600 NPC patients from four hospitals through paired CT, MRI series and clinical GTV annotations, and divided that dataset into 480 training volumes and 120 testing volumes. Results: Our proposed method can achieve a mean dice similarity cofficient (DSC) of 84.29 ± 7.33, a mean average symmetric surface distance (ASSD) of 1.31 ± 0.63, and a 95th percentile of Hausdorff (HD95) of 4.76 ± 1.98 on primary NPC tumor (GTVp) segmentation, and a DSC of 79.25 ± 10.01, an ASSD of 1.19 ± 0.72 and an HD95 of 4.76 ± 1.71 on metastasis NPC tumor (GTVnd) segmentation. Comparative experiments further demonstrate that our method can significantly improve the multimodal segmentation performance on NPC tumors, with superior advantages over five other state-of-the-art comparative methods. Visual evaluation on the segmentation prediction process and a three-step ablation study on input channels further demonstrate the interpretability of our proposed method. Conclusions: This study proposes a performant and interpretable multimodal segmentation method for GTV of NPC, contributing greatly to precision improvement for NPC therapy treatment.

## Linked entities

- **Diseases:** Nasopharyngeal Carcinoma (MONDO:0015459), NPC (MONDO:0011775)

## Full-text entities

- **Diseases:** NPC (MESH:D000077274), cancers (MESH:D009369)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12651726/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12651726/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/PMC12651726/full.md

---
Source: https://tomesphere.com/paper/PMC12651726