Language Augmentation in CLIP for Improved Anatomy Detection on   Multi-modal Medical Images

Mansi Kakkar; Dattesh Shanbhag; Chandan Aladahalli; Gurunath Reddy M

arXiv:2405.20735·cs.CV·June 3, 2024·1 cites

Language Augmentation in CLIP for Improved Anatomy Detection on Multi-modal Medical Images

Mansi Kakkar, Dattesh Shanbhag, Chandan Aladahalli, Gurunath Reddy M

PDF

Open Access

TL;DR

This paper enhances CLIP-based models to generate comprehensive, standardized descriptions of entire-body radiological images, significantly improving multi-modal anatomy detection in medical imaging.

Contribution

It introduces a novel approach to automate whole-body multi-modal descriptions using CLIP, filling a gap in existing clinical image captioning research.

Findings

01

Achieved 47.6% performance improvement over baseline models.

02

Enhanced correlation between organs and body stations through model augmentation.

03

Validated effectiveness of image and language augmentations in medical image description.

Abstract

Vision-language models have emerged as a powerful tool for previously challenging multi-modal classification problem in the medical domain. This development has led to the exploration of automated image description generation for multi-modal clinical scans, particularly for radiology report generation. Existing research has focused on clinical descriptions for specific modalities or body regions, leaving a gap for a model providing entire-body multi-modal descriptions. In this paper, we address this gap by automating the generation of standardized body station(s) and list of organ(s) across the whole body in multi-modal MR and CT radiological images. Leveraging the versatility of the Contrastive Language-Image Pre-training (CLIP), we refine and augment the existing approach through multiple experiments, including baseline model fine-tuning, adding station(s) as a superset for better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications