MedProbCLIP: Probabilistic Adaptation of Vision-Language Foundation Model for Reliable Radiograph-Report Retrieval

Ahmad Elallaf; Yu Zhang; Yuktha Priya Masupalli; Jeong Yang; Young Lee; Zechun Cao; Gongbo Liang

arXiv:2602.16019·cs.CV·February 19, 2026

MedProbCLIP: Probabilistic Adaptation of Vision-Language Foundation Model for Reliable Radiograph-Report Retrieval

Ahmad Elallaf, Yu Zhang, Yuktha Priya Masupalli, Jeong Yang, Young Lee, Zechun Cao, Gongbo Liang

PDF

Open Access

TL;DR

MedProbCLIP introduces a probabilistic vision-language model for chest X-ray and report retrieval, capturing uncertainty and improving reliability in high-stakes medical applications.

Contribution

It is the first to model radiograph and report embeddings as Gaussian distributions with a probabilistic contrastive objective for enhanced trustworthiness.

Findings

01

Outperforms existing models in retrieval and classification tasks.

02

Demonstrates superior calibration and robustness to corruptions.

03

Requires only single radiograph and report at inference.

Abstract

Vision-language foundation models have emerged as powerful general-purpose representation learners with strong potential for multimodal understanding, but their deterministic embeddings often fail to provide the reliability required for high-stakes biomedical applications. This work introduces MedProbCLIP, a probabilistic vision-language learning framework for chest X-ray and radiology report representation learning and bidirectional retrieval. MedProbCLIP models image and text representations as Gaussian embeddings through a probabilistic contrastive objective that explicitly captures uncertainty and many-to-many correspondences between radiographs and clinical narratives. A variational information bottleneck mitigates overconfident predictions, while MedProbCLIP employs multi-view radiograph encoding and multi-section report encoding during training to provide fine-grained supervision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 diagnosis using AI · Multimodal Machine Learning Applications · Topic Modeling