RadDiagSeg-M: A Vision Language Model for Joint Diagnosis and Multi-Target Segmentation in Radiology

Chengrun Li; Corentin Royer; Haozhe Luo; Bastian Wittmann; Xia Li; Ibrahim Hamamci; Sezgin Er; Anjany Sekuboyina; Bjoern Menze

arXiv:2510.18188·cs.CV·October 22, 2025

RadDiagSeg-M: A Vision Language Model for Joint Diagnosis and Multi-Target Segmentation in Radiology

Chengrun Li, Corentin Royer, Haozhe Luo, Bastian Wittmann, Xia Li, Ibrahim Hamamci, Sezgin Er, Anjany Sekuboyina, Bjoern Menze

PDF

Open Access

TL;DR

RadDiagSeg-M is a new vision-language model designed for joint diagnosis and multi-target segmentation in radiology, supported by a comprehensive dataset, aiming to improve assistive clinical tools by providing both descriptive and pixel-level outputs.

Contribution

The paper introduces RadDiagSeg-D, a hierarchical dataset for joint diagnosis and segmentation, and proposes RadDiagSeg-M, a model capable of simultaneous abnormality detection, diagnosis, and segmentation.

Findings

01

RadDiagSeg-M achieves strong performance across all task components.

02

The dataset supports multi-modal, multi-target medical imaging tasks.

03

RadDiagSeg-M provides clinically useful, informative outputs.

Abstract

Most current medical vision language models struggle to jointly generate diagnostic text and pixel-level segmentation masks in response to complex visual questions. This represents a major limitation towards clinical application, as assistive systems that fail to provide both modalities simultaneously offer limited value to medical practitioners. To alleviate this limitation, we first introduce RadDiagSeg-D, a dataset combining abnormality detection, diagnosis, and multi-target segmentation into a unified and hierarchical task. RadDiagSeg-D covers multiple imaging modalities and is precisely designed to support the development of models that produce descriptive text and corresponding segmentation masks in tandem. Subsequently, we leverage the dataset to propose a novel vision-language model, RadDiagSeg-M, capable of joint abnormality detection, diagnosis, and flexible segmentation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education · Topic Modeling