MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration
Lai Wei, Wenkai Wang, Xiaoyu Shen, Yu Xie, Zhihao Fan, Xiaojin Zhang,, Zhongyu Wei, Wei Chen

TL;DR
This paper presents MC-CoT, a modular framework that enhances zero-shot medical visual question answering by integrating large language models with multimodal models, improving reasoning and accuracy without task-specific fine-tuning.
Contribution
The paper introduces MC-CoT, a novel modular framework that combines LLMs and MLLMs to improve zero-shot Med-VQA performance through collaborative reasoning and knowledge integration.
Findings
MC-CoT outperforms standalone MLLMs in recall and accuracy.
Incorporating background knowledge improves reasoning in zero-shot Med-VQA.
MC-CoT surpasses existing multimodal CoT frameworks on multiple datasets.
Abstract
In recent advancements, multimodal large language models (MLLMs) have been fine-tuned on specific medical image datasets to address medical visual question answering (Med-VQA) tasks. However, this common approach of task-specific fine-tuning is costly and necessitates separate models for each downstream task, limiting the exploration of zero-shot capabilities. In this paper, we introduce MC-CoT, a modular cross-modal collaboration Chain-of-Thought (CoT) framework designed to enhance the zero-shot performance of MLLMs in Med-VQA by leveraging large language models (LLMs). MC-CoT improves reasoning and information extraction by integrating medical knowledge and task-specific guidance, where LLM provides various complex medical reasoning chains and MLLM provides various observations of medical images based on instructions of the LLM. Our experiments on datasets such as SLAKE, VQA-RAD, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Biomedical Text Mining and Ontologies · Machine Learning in Healthcare
