Contrastive Cross-Modal Pre-Training: A General Strategy for Small Sample Medical Imaging
Gongbo Liang, Connor Greenwell, Yu Zhang, Xiaoqin Wang, Ramakanth, Kavuluru, Nathan Jacobs

TL;DR
This paper introduces a contrastive cross-modal pre-training strategy that leverages medical reports as weak supervision to enhance small-sample medical imaging classification, significantly reducing labeled data requirements.
Contribution
The authors propose a novel contrastive pre-training method using image-text pairs to improve medical image interpretation with minimal labeled data, applicable across various tasks.
Findings
Achieved consistent performance improvements on three classification tasks.
Reduced labeled data needs by 67%-98%.
Applicable to any task with available text-image pairs.
Abstract
A key challenge in training neural networks for a given medical imaging task is often the difficulty of obtaining a sufficient number of manually labeled examples. In contrast, textual imaging reports, which are often readily available in medical records, contain rich but unstructured interpretations written by experts as part of standard clinical practice. We propose using these textual reports as a form of weak supervision to improve the image interpretation performance of a neural network without requiring additional manually labeled examples. We use an image-text matching task to train a feature extractor and then fine-tune it in a transfer learning setting for a supervised task using a small labeled dataset. The end result is a neural network that automatically interprets imagery without requiring textual reports during inference. This approach can be applied to any task for which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
