Prototype-Enhanced Confidence Modeling for Cross-Modal Medical Image-Report Retrieval

Shreyank N Gowda; Xiaobo Jin; Christian Wagner

arXiv:2508.03494·cs.CV·August 6, 2025

Prototype-Enhanced Confidence Modeling for Cross-Modal Medical Image-Report Retrieval

Shreyank N Gowda, Xiaobo Jin, Christian Wagner

PDF

TL;DR

This paper introduces PECM, a novel framework that uses multi-level prototypes and confidence modeling to improve the accuracy and robustness of cross-modal medical image-report retrieval, especially under data ambiguity.

Contribution

The paper proposes a prototype-enhanced confidence modeling framework that captures semantic variability and improves retrieval reliability in medical imaging reports.

Findings

01

Achieves up to 10.17% performance gain in retrieval tasks.

02

Improves robustness and consistency in clinical scenarios.

03

Effective in both supervised and zero-shot settings.

Abstract

In cross-modal retrieval tasks, such as image-to-report and report-to-image retrieval, accurately aligning medical images with relevant text reports is essential but challenging due to the inherent ambiguity and variability in medical data. Existing models often struggle to capture the nuanced, multi-level semantic relationships in radiology data, leading to unreliable retrieval results. To address these issues, we propose the Prototype-Enhanced Confidence Modeling (PECM) framework, which introduces multi-level prototypes for each modality to better capture semantic variability and enhance retrieval robustness. PECM employs a dual-stream confidence estimation that leverages prototype similarity distributions and an adaptive weighting mechanism to control the impact of high-uncertainty data on retrieval rankings. Applied to radiology image-report datasets, our method achieves significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.