PRIOR: Prototype Representation Joint Learning from Medical Images and Reports
Pujin Cheng, Li Lin, Junyan Lyu, Yijin Huang, Wenhan Luo, Xiaoying, Tang

TL;DR
PRIOR introduces a prototype-based joint learning framework for medical images and reports, leveraging local and global alignment, cross-modality reconstruction, and a non-auto-regressive generation paradigm to improve diverse medical vision-language tasks.
Contribution
It proposes a novel prototype representation learning framework with local alignment and cross-modality reconstruction for medical vision-language understanding.
Findings
Outperforms state-of-the-art methods on multiple medical vision-language tasks.
Effective in both supervised and zero-shot classification scenarios.
Enhances image-to-text retrieval, segmentation, and detection performance.
Abstract
Contrastive learning based vision-language joint pre-training has emerged as a successful representation learning strategy. In this paper, we present a prototype representation learning framework incorporating both global and local alignment between medical images and reports. In contrast to standard global multi-modality alignment methods, we employ a local alignment module for fine-grained representation. Furthermore, a cross-modality conditional reconstruction module is designed to interchange information across modalities in the training phase by reconstructing masked images and reports. For reconstructing long reports, a sentence-wise prototype memory bank is constructed, enabling the network to focus on low-level localized visual and high-level clinical linguistic features. Additionally, a non-auto-regressive generation paradigm is proposed for reconstructing non-sequential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
PRIOR: Prototype Representation Joint Learning from Medical Images and Reports· youtube
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI
MethodsFocus
