Contrastive Learning of Medical Visual Representations from Paired Images and Text
Yuhao Zhang, Hang Jiang, Yasuhide Miura, Christopher D. Manning,, Curtis P. Langlotz

TL;DR
This paper introduces ConVIRT, a domain-agnostic unsupervised contrastive learning method that leverages paired text and images to learn effective medical visual representations, significantly reducing the need for labeled data.
Contribution
ConVIRT is a novel unsupervised pretraining approach that exploits paired text and images in medical data, outperforming traditional transfer learning and requiring less labeled data.
Findings
ConVIRT outperforms strong baselines in classification and retrieval tasks.
Requires only 10% of labeled data to achieve comparable performance to ImageNet pretraining.
Pretraining with paired text improves medical image representations significantly.
Abstract
Learning visual representations of medical images (e.g., X-rays) is core to medical image understanding but its progress has been held back by the scarcity of human annotations. Existing work commonly relies on fine-tuning weights transferred from ImageNet pretraining, which is suboptimal due to drastically different image characteristics, or rule-based label extraction from the textual report data paired with medical images, which is inaccurate and hard to generalize. Meanwhile, several recent studies show exciting results from unsupervised contrastive learning from natural images, but we find these methods help little on medical images because of their high inter-class similarity. We propose ConVIRT, an alternative unsupervised strategy to learn medical visual representations by exploiting naturally occurring paired descriptive text. Our new method of pretraining medical image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Image Retrieval and Classification Techniques
