Lost in the Hype: Revealing and Dissecting the Performance Degradation of Medical Multimodal Large Language Models in Image Classification

Xun Zhu; Fanbin Mo; Xi Chen; Kaili Zheng; Shaoshuai Yang; Yiming Shi; Jian Gao; Miao Li; Ji Wu

arXiv:2604.08333·cs.CV·April 10, 2026

Lost in the Hype: Revealing and Dissecting the Performance Degradation of Medical Multimodal Large Language Models in Image Classification

Xun Zhu, Fanbin Mo, Xi Chen, Kaili Zheng, Shaoshuai Yang, Yiming Shi, Jian Gao, Miao Li, Ji Wu

PDF

TL;DR

This paper investigates why medical multimodal large language models underperform in image classification, revealing four key failure modes through extensive experiments and feature probing.

Contribution

It is the first study to dissect the causes of performance degradation in medical MLLMs, introducing quantitative scores for feature health and providing critical insights.

Findings

01

Identified four failure modes: visual representation quality, connector fidelity, LLM reasoning, semantic alignment.

02

Developed quantitative scores to assess feature evolution healthiness.

03

Provided insights into barriers preventing clinical deployment of medical MLLMs.

Abstract

The rise of multimodal large language models (MLLMs) has sparked an unprecedented wave of applications in the field of medical imaging analysis. However, as one of the earliest and most fundamental tasks integrated into this paradigm, medical image classification reveals a sobering reality: state-of-the-art medical MLLMs consistently underperform compared to traditional deep learning models, despite their overwhelming advantages in pre-training data and model parameters. This paradox prompts a critical rethinking: where exactly does the performance degradation originate? In this paper, we conduct extensive experiments on 14 open-source medical MLLMs across three representative image classification datasets. Moving beyond superficial performance benchmarking, we employ feature probing to track the information flow of visual features module-by-module and layer-by-layer throughout the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.