MedGround: Bridging the Evidence Gap in Medical Vision-Language Models with Verified Grounding Data

Mengmeng Zhang; Xiaoping Wu; Hao Luo; Fan Wang; Yisheng Lv

arXiv:2601.06847·cs.CV·January 13, 2026

MedGround: Bridging the Evidence Gap in Medical Vision-Language Models with Verified Grounding Data

Mengmeng Zhang, Xiaoping Wu, Hao Luo, Fan Wang, Yisheng Lv

PDF

Open Access

TL;DR

MedGround introduces a scalable pipeline and dataset for improving medical vision-language models by providing verified grounding data, leading to better clinical narrative grounding and generalization.

Contribution

We develop MedGround, an automated method to generate high-quality medical grounding data from segmentation resources, enhancing VLM training and performance.

Findings

01

Improved grounding accuracy in VLMs trained with MedGround-35K

02

Enhanced multi-object semantic disambiguation

03

Strong generalization to unseen grounding scenarios

Abstract

Vision-Language Models (VLMs) can generate convincing clinical narratives, yet frequently struggle to visually ground their statements. We posit this limitation arises from the scarcity of high-quality, large-scale clinical referring-localization pairs. To address this, we introduce MedGround, an automated pipeline that transforms segmentation resources into high-quality medical referring grounding data. Leveraging expert masks as spatial anchors, MedGround precisely derives localization targets, extracts shape and spatial cues, and guides VLMs to synthesize natural, clinically grounded queries that reflect morphology and location. To ensure data rigor, a multi-stage verification system integrates strict formatting checks, geometry- and medical-prior rules, and image-based visual judging to filter out ambiguous or visually unsupported samples. Finally, we present MedGround-35K, a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning