Med-GLIP: Advancing Medical Language-Image Pre-training with Large-scale Grounded Dataset

Ziye Deng; Ruihan He; Jiaxiang Liu; Yuan Wang; Zijie Meng; Songtao Jiang; Yong Xie; Zuozhu Liu

arXiv:2508.10528·cs.CV·November 7, 2025

Med-GLIP: Advancing Medical Language-Image Pre-training with Large-scale Grounded Dataset

Ziye Deng, Ruihan He, Jiaxiang Liu, Yuan Wang, Zijie Meng, Songtao Jiang, Yong Xie, Zuozhu Liu

PDF

1 Datasets

TL;DR

Med-GLIP introduces a large-scale, diverse dataset and a novel modality-aware framework for medical image grounding, significantly improving accuracy and generalization in aligning language with image regions across various medical imaging tasks.

Contribution

The paper presents Med-GLIP, a new grounded dataset with over 5.3 million annotations, and a modality-aware model that learns hierarchical medical semantics without explicit expert modules.

Findings

01

Med-GLIP outperforms existing methods on multiple benchmarks.

02

Integrating Med-GLIP enhances medical VQA and report generation.

03

The dataset covers seven imaging modalities with detailed annotations.

Abstract

Medical image grounding aims to align natural language phrases with specific regions in medical images, serving as a foundational task for intelligent diagnosis, visual question answering (VQA), and automated report generation (MRG). However, existing research is constrained by limited modality coverage, coarse-grained annotations, and the absence of a unified, generalizable grounding framework. To address these challenges, we construct a large-scale medical grounding dataset Med-GLIP-5M comprising over 5.3 million region-level annotations across seven imaging modalities, covering diverse anatomical structures and pathological findings. The dataset supports both segmentation and grounding tasks with hierarchical region labels, ranging from organ-level boundaries to fine-grained lesions. Based on this foundation, we propose Med-GLIP, a modality-aware grounding framework trained on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Med-GLIP/Med-GLIP-5M
dataset· 282 dl
282 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.