Improving Medical Multi-modal Contrastive Learning with Expert   Annotations

Yogesh Kumar; Pekka Marttinen

arXiv:2403.10153·cs.CV·July 16, 2024·1 cites

Improving Medical Multi-modal Contrastive Learning with Expert Annotations

Yogesh Kumar, Pekka Marttinen

PDF

Open Access 1 Repo

TL;DR

eCLIP enhances medical multi-modal contrastive learning by integrating expert radiologist annotations, notably eye-gaze heatmaps, to improve embedding quality, address data scarcity, and bridge the modality gap, leading to better cross-modal tasks.

Contribution

The paper introduces eCLIP, a novel method that incorporates expert annotations into CLIP for medical imaging, improving multi-modal representations without altering the core architecture.

Findings

01

Improved embedding alignment and uniformity across tasks.

02

Enhanced zero-shot and retrieval performance in medical imaging.

03

Effective utilization of scarce expert annotations through mixup augmentation.

Abstract

We introduce eCLIP, an enhanced version of the CLIP model that integrates expert annotations in the form of radiologist eye-gaze heatmaps. It tackles key challenges in contrastive multi-modal medical imaging analysis, notably data scarcity and the "modality gap" -- a significant disparity between image and text embeddings that diminishes the quality of representations and hampers cross-modal interoperability. eCLIP integrates a heatmap processor and leverages mixup augmentation to efficiently utilize the scarce expert annotations, thus boosting the model's learning effectiveness. eCLIP is designed to be generally applicable to any variant of CLIP without requiring any modifications of the core architecture. Through detailed evaluations across several tasks, including zero-shot inference, linear probing, cross-modal retrieval, and Retrieval Augmented Generation (RAG) of radiology reports…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ykumards/eclip
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsHeatmap · Mixup · Contrastive Language-Image Pre-training