CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization   in Healthcare

Akash Ghosh; Arkadeep Acharya; Raghav Jain; Sriparna Saha; Aman; Chadha; Setu Sinha

arXiv:2312.11541·cs.AI·December 20, 2023·1 cites

CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare

Akash Ghosh, Arkadeep Acharya, Raghav Jain, Sriparna Saha, Aman, Chadha, Setu Sinha

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces a multimodal framework combining CLIP and LLMs to generate medical question summaries that incorporate visual information, improving understanding and decision-making in healthcare.

Contribution

It presents the MMQS dataset pairing medical queries with visual aids and a novel multimodal summarization framework utilizing CLIP and LLMs for enhanced medical query understanding.

Findings

01

Visual cues improve summary quality

02

Multimodal approach enhances medical understanding

03

Framework outperforms text-only methods

Abstract

In the era of modern healthcare, swiftly generating medical question summaries is crucial for informed and timely patient care. Despite the increasing complexity and volume of medical data, existing studies have focused solely on text-based summarization, neglecting the integration of visual information. Recognizing the untapped potential of combining textual queries with visual representations of medical conditions, we introduce the Multimodal Medical Question Summarization (MMQS) Dataset. This dataset, a major contribution to our work, pairs medical queries with visual aids, facilitating a richer and more nuanced understanding of patient needs. We also propose a framework, utilizing the power of Contrastive Language Image Pretraining(CLIP) and Large Language Models(LLMs), consisting of four modules that identify medical disorders, generate relevant context, filter medical concepts,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

akashghosh/clipsyntel-aaai2024
noneOfficial

Datasets

ArkaAcharya/MMQSD_ClipSyntel
dataset· 13 dl
13 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Text and Document Classification Technologies

MethodsContrastive Language-Image Pre-training · Attentive Walk-Aggregating Graph Neural Network