Can Large Language Models Challenge CNNs in Medical Image Analysis?
Shibbir Ahmed, Shahnewaz Karim Sakib, Anindya Bijoy Das

TL;DR
This paper compares CNNs and large language models in medical image analysis, revealing that CNNs generally outperform multimodal models, but filtering LLMs can improve their diagnostic performance and efficiency.
Contribution
It provides a comprehensive comparison of CNNs and LLMs in medical diagnostics, highlighting how filtering techniques can enhance LLM performance.
Findings
CNNs outperform multimodal models in accuracy and efficiency
Filtering improves LLM diagnostic performance
Multimodal AI can enhance clinical diagnostic scalability
Abstract
This study presents a multimodal AI framework designed for precisely classifying medical diagnostic images. Utilizing publicly available datasets, the proposed system compares the strengths of convolutional neural networks (CNNs) and different large language models (LLMs). This in-depth comparative analysis highlights key differences in diagnostic performance, execution efficiency, and environmental impacts. Model evaluation was based on accuracy, F1-score, average execution time, average energy consumption, and estimated emission. The findings indicate that although CNN-based models can outperform various multimodal techniques that incorporate both images and contextual information, applying additional filtering on top of LLMs can lead to substantial performance gains. These findings highlight the transformative potential of multimodal AI systems to enhance the reliability,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · AI in cancer detection · COVID-19 diagnosis using AI
