Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning
Xinyao Liu, Diping Song

TL;DR
This paper presents FundusExpert, an ophthalmology-specific multimodal large language model with integrated reasoning capabilities, trained on a new dataset FundusGen, achieving superior performance in clinical ophthalmic tasks and demonstrating the importance of data quality and cognitive alignment.
Contribution
Introduction of FundusExpert, a specialized MLLM for ophthalmology, and FundusGen, a dataset with clinical reasoning annotations, advancing cross-modal understanding and interpretability in medical AI.
Findings
FundusExpert surpasses 40B MedRegA accuracy by 26.6%.
Achieves 77.0% clinical consistency in zero-shot report generation.
Demonstrates a data quality-capability scaling law ($L \,\propto\, N^{0.068}$).
Abstract
Multimodal large language models (MLLMs) demonstrate significant potential in the field of medical diagnosis. However, they face critical challenges in specialized domains such as ophthalmology, particularly the fragmentation of annotation granularity and inconsistencies in clinical reasoning logic, which hinder precise cross-modal understanding. This paper introduces FundusExpert, an ophthalmology-specific MLLM with integrated positioning-diagnosis reasoning capabilities, along with FundusGen, a dataset constructed through the intelligent Fundus-Engine system. Fundus-Engine automates localization and leverages MLLM-based semantic expansion to integrate global disease classification, local object detection, and fine-grained feature analysis within a single fundus image. Additionally, by constructing a clinically aligned cognitive chain, it guides the model to generate interpretable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsContext-Aware Activity Recognition Systems · Electronic Health Records Systems · Speech and dialogue systems
