TL;DR
X-PCR introduces a comprehensive benchmark for evaluating multi-modal large language models' clinical reasoning in ophthalmology, emphasizing progressive reasoning and cross-modal integration across diverse imaging modalities.
Contribution
It presents the first complete ophthalmology diagnostic workflow benchmark, including reasoning tasks and a large curated dataset for evaluating MLLMs.
Findings
Current MLLMs show significant gaps in clinical reasoning capabilities.
The benchmark covers 52 ophthalmic diseases with extensive multi-modal data.
Evaluation highlights the need for improved models in progressive and cross-modal reasoning.
Abstract
Despite significant progress in Multi-modal Large Language Models (MLLMs), their clinical reasoning capacity for multi-modal diagnosis remains largely unexamined. Current benchmarks, mostly single-modality data, can't evaluate progressive reasoning and cross-modal integration essential for clinical practice. We introduce the Cross-Modality Progressive Clinical Reasoning (X-PCR) benchmark, the first comprehensive evaluation of MLLMs through a complete ophthalmology diagnostic workflow, with two reasoning tasks: 1) a six-stage progressive reasoning chain spanning image quality assessment to clinical decision-making, and 2) a cross-modality reasoning task integrating six imaging modalities. The benchmark comprises 26,415 images and 177,868 expert-verified VQA pairs curated from 51 public datasets, covering 52 ophthalmic diseases. Evaluation of 21 MLLMs reveals critical gaps in progressive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
