Contrastive Learning of Medical Visual Representations from Paired   Images and Text

Yuhao Zhang; Hang Jiang; Yasuhide Miura; Christopher D. Manning,; Curtis P. Langlotz

arXiv:2010.00747·cs.CV·September 21, 2022·278 cites

Contrastive Learning of Medical Visual Representations from Paired Images and Text

Yuhao Zhang, Hang Jiang, Yasuhide Miura, Christopher D. Manning,, Curtis P. Langlotz

PDF

Open Access 5 Repos 4 Models

TL;DR

This paper introduces ConVIRT, a domain-agnostic unsupervised contrastive learning method that leverages paired text and images to learn effective medical visual representations, significantly reducing the need for labeled data.

Contribution

ConVIRT is a novel unsupervised pretraining approach that exploits paired text and images in medical data, outperforming traditional transfer learning and requiring less labeled data.

Findings

01

ConVIRT outperforms strong baselines in classification and retrieval tasks.

02

Requires only 10% of labeled data to achieve comparable performance to ImageNet pretraining.

03

Pretraining with paired text improves medical image representations significantly.

Abstract

Learning visual representations of medical images (e.g., X-rays) is core to medical image understanding but its progress has been held back by the scarcity of human annotations. Existing work commonly relies on fine-tuning weights transferred from ImageNet pretraining, which is suboptimal due to drastically different image characteristics, or rule-based label extraction from the textual report data paired with medical images, which is inaccurate and hard to generalize. Meanwhile, several recent studies show exciting results from unsupervised contrastive learning from natural images, but we find these methods help little on medical images because of their high inter-class similarity. We propose ConVIRT, an alternative unsupervised strategy to learn medical visual representations by exploiting naturally occurring paired descriptive text. Our new method of pretraining medical image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Image Retrieval and Classification Techniques