Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning
Fuying Wang, Yuyin Zhou, Shujun Wang, Varut Vardhanabhuti, Lequan Yu

TL;DR
This paper introduces a multi-granularity cross-modal alignment framework that leverages semantic correspondences at multiple levels to improve generalized medical visual representation learning from radiology reports.
Contribution
It proposes a novel MGCA framework that aligns medical images and reports at pathological, instance, and disease levels, enhancing representation learning beyond local supervision.
Findings
Outperforms existing methods on seven medical image datasets.
Improves performance in classification, detection, and segmentation tasks.
Demonstrates stable and superior results across various tasks.
Abstract
Learning medical visual representations directly from paired radiology reports has become an emerging topic in representation learning. However, existing medical image-text joint learning methods are limited by instance or local supervision analysis, ignoring disease-level semantic correspondences. In this paper, we present a novel Multi-Granularity Cross-modal Alignment (MGCA) framework for generalized medical visual representation learning by harnessing the naturally exhibited semantic correspondences between medical image and radiology reports at three different levels, i.e., pathological region-level, instance-level, and disease-level. Specifically, we first incorporate the instance-wise alignment module by maximizing the agreement between image-report pairs. Further, for token-wise alignment, we introduce a bidirectional cross-attention strategy to explicitly learn the matching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research
MethodsContrastive Learning · ALIGN
