Distribution-Based Masked Medical Vision-Language Model Using Structured Reports
Shreyank N Gowda, Ruichi Zhang, Xiao Gu, Ying Weng, Lu Yang

TL;DR
This paper presents an uncertainty-aware medical image-text pre-training model that leverages structured reports from large language models to improve clinical understanding and performance in medical image analysis tasks.
Contribution
It introduces a novel approach using structured reports and uncertainty modeling to enhance medical vision-language pre-training, especially for Chest X-Rays.
Findings
Achieves state-of-the-art results on multiple downstream tasks.
Effectively models clinical uncertainty and ambiguity.
Improves generalization in medical image analysis.
Abstract
Medical image-language pre-training aims to align medical images with clinically relevant text to improve model performance on various downstream tasks. However, existing models often struggle with the variability and ambiguity inherent in medical data, limiting their ability to capture nuanced clinical information and uncertainty. This work introduces an uncertainty-aware medical image-text pre-training model that enhances generalization capabilities in medical image analysis. Building on previous methods and focusing on Chest X-Rays, our approach utilizes structured text reports generated by a large language model (LLM) to augment image data with clinically relevant context. These reports begin with a definition of the disease, followed by the `appearance' section to highlight critical regions of interest, and finally `observations' and `verdicts' that ground model predictions in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
