Vision-Language Models for Automated 3D PET/CT Report Generation

Wenpei Jiao; Kun Shang; Hui Li; Ke Yan; Jiajin Zhang; Guangjie Yang; Lijuan Guo; Yan Wan; Xing Yang; Dakai Jin; Zhaoheng Xie

arXiv:2511.20145·cs.CV·November 26, 2025

Vision-Language Models for Automated 3D PET/CT Report Generation

Wenpei Jiao, Kun Shang, Hui Li, Ke Yan, Jiajin Zhang, Guangjie Yang, Lijuan Guo, Yan Wan, Xing Yang, Dakai Jin, Zhaoheng Xie

PDF

Open Access

TL;DR

This paper introduces PETRG-3D, a novel 3D dual-branch framework for automated PET/CT report generation, addressing challenges in functional imaging and variability across hospitals, and demonstrates its effectiveness on new datasets and evaluation protocols.

Contribution

The paper presents PETRG-3D, a new end-to-end 3D model for PET/CT report generation, along with new datasets and a clinical evaluation protocol, advancing disease-aware reasoning in medical AI.

Findings

01

PETRG-3D outperforms existing methods on natural language metrics.

02

The model improves clinical efficacy metrics by 8.18%.

03

Style-adaptive prompts enhance reporting consistency across hospitals.

Abstract

Positron emission tomography/computed tomography (PET/CT) is essential in oncology, yet the rapid expansion of scanners has outpaced the availability of trained specialists, making automated PET/CT report generation (PETRG) increasingly important for reducing clinical workload. Compared with structural imaging (e.g., X-ray, CT, and MRI), functional PET poses distinct challenges: metabolic patterns vary with tracer physiology, and whole-body 3D contextual information is required rather than local-region interpretation. To advance PETRG, we propose PETRG-3D, an end-to-end 3D dual-branch framework that separately encodes PET and CT volumes and incorporates style-adaptive prompts to mitigate inter-hospital variability in reporting practices. We construct PETRG-Lym, a multi-center lymphoma dataset collected from four hospitals (824 reports w/ 245,509 paired PET/CT slices), and construct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging · Multimodal Machine Learning Applications · Topic Modeling