Med-GLIP: Advancing Medical Language-Image Pre-training with Large-scale Grounded Dataset
Ziye Deng, Ruihan He, Jiaxiang Liu, Yuan Wang, Zijie Meng, Songtao Jiang, Yong Xie, Zuozhu Liu

TL;DR
Med-GLIP introduces a large-scale, diverse dataset and a novel modality-aware framework for medical image grounding, significantly improving accuracy and generalization in aligning language with image regions across various medical imaging tasks.
Contribution
The paper presents Med-GLIP, a new grounded dataset with over 5.3 million annotations, and a modality-aware model that learns hierarchical medical semantics without explicit expert modules.
Findings
Med-GLIP outperforms existing methods on multiple benchmarks.
Integrating Med-GLIP enhances medical VQA and report generation.
The dataset covers seven imaging modalities with detailed annotations.
Abstract
Medical image grounding aims to align natural language phrases with specific regions in medical images, serving as a foundational task for intelligent diagnosis, visual question answering (VQA), and automated report generation (MRG). However, existing research is constrained by limited modality coverage, coarse-grained annotations, and the absence of a unified, generalizable grounding framework. To address these challenges, we construct a large-scale medical grounding dataset Med-GLIP-5M comprising over 5.3 million region-level annotations across seven imaging modalities, covering diverse anatomical structures and pathological findings. The dataset supports both segmentation and grounding tasks with hierarchical region labels, ranging from organ-level boundaries to fine-grained lesions. Based on this foundation, we propose Med-GLIP, a modality-aware grounding framework trained on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
