PETAR: Localized Findings Generation with Mask-Aware Vision-Language Modeling for PET Automated Reporting

Danyal Maqbool; Changhee Lee; Zachary Huemann; Samuel D. Church; Matthew E. Larson; Scott B. Perlman; Tomas A. Romero; Joshua D. Warner; Meghan Lubner; Xin Tie; Jameson Merkow; Junjie Hu; Steve Y. Cho; Tyler J. Bradshaw

arXiv:2510.27680·cs.CV·December 2, 2025

PETAR: Localized Findings Generation with Mask-Aware Vision-Language Modeling for PET Automated Reporting

Danyal Maqbool, Changhee Lee, Zachary Huemann, Samuel D. Church, Matthew E. Larson, Scott B. Perlman, Tomas A. Romero, Joshua D. Warner, Meghan Lubner, Xin Tie, Jameson Merkow, Junjie Hu, Steve Y. Cho, Tyler J. Bradshaw

PDF

Open Access

TL;DR

This paper introduces PETAR, a novel 3D vision-language model and a large-scale dataset for automated PET report generation, addressing the complexity of 3D medical imaging and clinical findings with improved accuracy and clinical relevance.

Contribution

The paper presents PETAR-4B, a new 3D vision-language model for PET reporting, and PETARSeg-11K, the first large-scale dataset with lesion-level annotations for PET/CT imaging.

Findings

01

PETAR-4B outperforms existing 2D and 3D baselines in automated metrics.

02

A human study with physicians confirms the model's clinical utility.

03

The dataset enables detailed lesion-level analysis in PET/CT reports.

Abstract

Generating automated reports for 3D positron emission tomography (PET) is an important and challenging task in medical imaging. PET plays a vital role in oncology, but automating report generation is difficult due to the complexity of whole-body 3D volumes, the wide range of potential clinical findings, and the limited availability of annotated datasets. To address these challenges, we introduce PETARSeg-11K, the first large-scale, publicly available dataset that provides lesion-level correspondence between 3D PET/CT volumes and free-text radiological findings. It comprises 11,356 lesion descriptions paired with 3D segmentations. Second, we propose PETAR-4B, a 3D vision-language model designed for mask-aware, spatially grounded PET/CT reporting. PETAR-4B jointly encodes PET, CT, and 3D lesion segmentation masks, using a 3D focal prompt to capture fine-grained details of lesions that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Medical Imaging Techniques and Applications · Radiomics and Machine Learning in Medical Imaging