Multi-Granularity Cross-modal Alignment for Generalized Medical Visual   Representation Learning

Fuying Wang; Yuyin Zhou; Shujun Wang; Varut Vardhanabhuti; Lequan Yu

arXiv:2210.06044·cs.CV·October 13, 2022·60 cites

Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning

Fuying Wang, Yuyin Zhou, Shujun Wang, Varut Vardhanabhuti, Lequan Yu

PDF

Open Access 2 Repos 2 Models 1 Video

TL;DR

This paper introduces a multi-granularity cross-modal alignment framework that leverages semantic correspondences at multiple levels to improve generalized medical visual representation learning from radiology reports.

Contribution

It proposes a novel MGCA framework that aligns medical images and reports at pathological, instance, and disease levels, enhancing representation learning beyond local supervision.

Findings

01

Outperforms existing methods on seven medical image datasets.

02

Improves performance in classification, detection, and segmentation tasks.

03

Demonstrates stable and superior results across various tasks.

Abstract

Learning medical visual representations directly from paired radiology reports has become an emerging topic in representation learning. However, existing medical image-text joint learning methods are limited by instance or local supervision analysis, ignoring disease-level semantic correspondences. In this paper, we present a novel Multi-Granularity Cross-modal Alignment (MGCA) framework for generalized medical visual representation learning by harnessing the naturally exhibited semantic correspondences between medical image and radiology reports at three different levels, i.e., pathological region-level, instance-level, and disease-level. Specifically, we first incorporate the instance-wise alignment module by maximizing the agreement between image-report pairs. Further, for token-wise alignment, we introduce a bidirectional cross-attention strategy to explicitly learn the matching…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research

MethodsContrastive Learning · ALIGN