MedSG-Bench: A Benchmark for Medical Image Sequences Grounding

Jingkun Yue; Siqi Zhang; Zinan Jia; Huihuan Xu; Zongbo Han; Xiaohong Liu; Guangyu Wang

arXiv:2505.11852·cs.CV·May 20, 2025

MedSG-Bench: A Benchmark for Medical Image Sequences Grounding

Jingkun Yue, Siqi Zhang, Zinan Jia, Huihuan Xu, Zongbo Han, Xiaohong Liu, Guangyu Wang

PDF

Open Access 2 Datasets 1 Video

TL;DR

MedSG-Bench introduces the first comprehensive benchmark for medical image sequences grounding, addressing the gap in sequential image analysis for clinical applications, and includes datasets, tasks, and models to advance research in this area.

Contribution

It presents MedSG-Bench, a novel benchmark with datasets and tasks for medical image sequences grounding, and introduces MedSeq-Grounder, a specialized model for this purpose.

Findings

01

Existing models show limitations in sequential medical grounding tasks.

02

MedSG-188K dataset supports large-scale instruction tuning.

03

MedSeq-Grounder enhances understanding across sequential medical images.

Abstract

Visual grounding is essential for precise perception and reasoning in multimodal large language models (MLLMs), especially in medical imaging domains. While existing medical visual grounding benchmarks primarily focus on single-image scenarios, real-world clinical applications often involve sequential images, where accurate lesion localization across different modalities and temporal tracking of disease progression (e.g., pre- vs. post-treatment comparison) require fine-grained cross-image semantic alignment and context-aware reasoning. To remedy the underrepresentation of image sequences in existing medical visual grounding benchmarks, we propose MedSG-Bench, the first benchmark tailored for Medical Image Sequences Grounding. It comprises eight VQA-style tasks, formulated into two paradigms of the grounding tasks, including 1) Image Difference Grounding, which focuses on detecting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

MedSG-Bench: A Benchmark for Medical Image Sequences Grounding· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis

MethodsFocus