MedRCube: A Multidimensional Framework for Fine-Grained and In-Depth Evaluation of MLLMs in Medical Imaging

Zhijie Bao; Fangke Chen; Licheng Bao; Chenhui Zhang; Wei Chen; Jiajie Peng; Zhongyu Wei

arXiv:2604.13756·cs.CL·April 16, 2026

MedRCube: A Multidimensional Framework for Fine-Grained and In-Depth Evaluation of MLLMs in Medical Imaging

Zhijie Bao, Fangke Chen, Licheng Bao, Chenhui Zhang, Wei Chen, Jiajie Peng, Zhongyu Wei

PDF

1 Repo 1 Datasets

TL;DR

MedRCube introduces a multidimensional evaluation framework for medical imaging MLLMs, revealing insights into model reasoning and trustworthiness that surpass prior coarse metrics.

Contribution

It presents a novel, fine-grained evaluation paradigm and benchmark for medical imaging MLLMs, including a credibility subset and insights into model reasoning behaviors.

Findings

01

Lingshu-32B achieves top-tier performance among 33 MLLMs.

02

MedRCube exposes new insights into model reasoning and reliability.

03

A positive correlation between shortcut behavior and diagnostic accuracy was found.

Abstract

The potential of Multimodal Large Language Models (MLLMs) in domain of medical imaging raise the demands of systematic and rigorous evaluation frameworks that are aligned with the real-world medical imaging practice. Existing practices that report single or coarse-grained metrics are lack the granularity required for specialized clinical support and fail to assess the reliability of reasoning mechanisms. To address this, we propose a paradigm shift toward multidimensional, fine-grained and in-depth evaluation. Based on a two-stage systematic construction pipeline designed for this paradigm, we instantiate it with MedRCube. We benchmark 33 MLLMs, \textit{Lingshu-32B} achieve top-tier performance. Crucially, MedRCube exposes a series of pronounced insights inaccessible under prior evaluation settings. Furthermore, we introduce a credibility evaluation subset to quantify reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

F1mc/MedRCube
github

Datasets

Flmc/MedRCube
dataset· 1.7k dl
1.7k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.