MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA   with LLM and MLLM Integration

Lai Wei; Wenkai Wang; Xiaoyu Shen; Yu Xie; Zhihao Fan; Xiaojin Zhang,; Zhongyu Wei; Wei Chen

arXiv:2410.04521·cs.CV·October 8, 2024·3 cites

MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration

Lai Wei, Wenkai Wang, Xiaoyu Shen, Yu Xie, Zhihao Fan, Xiaojin Zhang,, Zhongyu Wei, Wei Chen

PDF

Open Access 1 Repo

TL;DR

This paper presents MC-CoT, a modular framework that enhances zero-shot medical visual question answering by integrating large language models with multimodal models, improving reasoning and accuracy without task-specific fine-tuning.

Contribution

The paper introduces MC-CoT, a novel modular framework that combines LLMs and MLLMs to improve zero-shot Med-VQA performance through collaborative reasoning and knowledge integration.

Findings

01

MC-CoT outperforms standalone MLLMs in recall and accuracy.

02

Incorporating background knowledge improves reasoning in zero-shot Med-VQA.

03

MC-CoT surpasses existing multimodal CoT frameworks on multiple datasets.

Abstract

In recent advancements, multimodal large language models (MLLMs) have been fine-tuned on specific medical image datasets to address medical visual question answering (Med-VQA) tasks. However, this common approach of task-specific fine-tuning is costly and necessitates separate models for each downstream task, limiting the exploration of zero-shot capabilities. In this paper, we introduce MC-CoT, a modular cross-modal collaboration Chain-of-Thought (CoT) framework designed to enhance the zero-shot performance of MLLMs in Med-VQA by leveraging large language models (LLMs). MC-CoT improves reasoning and information extraction by integrating medical knowledge and task-specific guidance, where LLM provides various complex medical reasoning chains and MLLM provides various observations of medical images based on instructions of the LLM. Our experiments on datasets such as SLAKE, VQA-RAD, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thomaswei-cn/MC-CoT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging · Biomedical Text Mining and Ontologies · Machine Learning in Healthcare