Distribution-Based Masked Medical Vision-Language Model Using Structured Reports

Shreyank N Gowda; Ruichi Zhang; Xiao Gu; Ying Weng; Lu Yang

arXiv:2507.21794·cs.CV·July 30, 2025

Distribution-Based Masked Medical Vision-Language Model Using Structured Reports

Shreyank N Gowda, Ruichi Zhang, Xiao Gu, Ying Weng, Lu Yang

PDF

TL;DR

This paper presents an uncertainty-aware medical image-text pre-training model that leverages structured reports from large language models to improve clinical understanding and performance in medical image analysis tasks.

Contribution

It introduces a novel approach using structured reports and uncertainty modeling to enhance medical vision-language pre-training, especially for Chest X-Rays.

Findings

01

Achieves state-of-the-art results on multiple downstream tasks.

02

Effectively models clinical uncertainty and ambiguity.

03

Improves generalization in medical image analysis.

Abstract

Medical image-language pre-training aims to align medical images with clinically relevant text to improve model performance on various downstream tasks. However, existing models often struggle with the variability and ambiguity inherent in medical data, limiting their ability to capture nuanced clinical information and uncertainty. This work introduces an uncertainty-aware medical image-text pre-training model that enhances generalization capabilities in medical image analysis. Building on previous methods and focusing on Chest X-Rays, our approach utilizes structured text reports generated by a large language model (LLM) to augment image data with clinically relevant context. These reports begin with a definition of the disease, followed by the `appearance' section to highlight critical regions of interest, and finally `observations' and `verdicts' that ground model predictions in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.