LLM-Guided Diagnostic Evidence Alignment for Medical Vision-Language Pretraining under Limited Pairing

Huimin Yan; Liang Bai; Xian Yang; Long Chen

arXiv:2602.07540·cs.CV·February 10, 2026

LLM-Guided Diagnostic Evidence Alignment for Medical Vision-Language Pretraining under Limited Pairing

Huimin Yan, Liang Bai, Xian Yang, Long Chen

PDF

Open Access

TL;DR

This paper introduces LGDEA, a novel medical vision-language pretraining method that uses LLMs to align diagnostic evidence across modalities, reducing dependence on paired data and improving performance in medical image analysis tasks.

Contribution

The paper proposes evidence-level alignment guided by LLMs, enabling effective use of unpaired data and improving diagnostic representation learning in medical vision-language pretraining.

Findings

01

Significant improvements in phrase grounding, image-text retrieval, and zero-shot classification.

02

Rivals methods relying on large amounts of paired data.

03

Effective exploitation of unpaired medical images and reports.

Abstract

Most existing CLIP-style medical vision--language pretraining methods rely on global or local alignment with substantial paired data. However, global alignment is easily dominated by non-diagnostic information, while local alignment fails to integrate key diagnostic evidence. As a result, learning reliable diagnostic representations becomes difficult, which limits their applicability in medical scenarios with limited paired data. To address this issue, we propose an LLM-Guided Diagnostic Evidence Alignment method (LGDEA), which shifts the pretraining objective toward evidence-level alignment that is more consistent with the medical diagnostic process. Specifically, we leverage LLMs to extract key diagnostic evidence from radiology reports and construct a shared diagnostic evidence space, enabling evidence-aware cross-modal alignment and allowing LGDEA to effectively exploit abundant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI