VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge
Zihan Li, Diping Song, Zefeng Yang, Deming Wang, Fei Li, Xiulan Zhang, Paul E. Kinahan, Yu Qiao

TL;DR
VisionUnite is a comprehensive vision-language model for ophthalmology that integrates clinical knowledge, trained on large datasets, and demonstrates diagnostic and educational capabilities comparable to junior ophthalmologists.
Contribution
Introduction of VisionUnite, a novel ophthalmology-focused vision-language foundation model enhanced with clinical knowledge and trained on extensive datasets, outperforming existing models in diagnostics and education.
Findings
Outperforms GPT-4V and Gemini Pro in diagnostics.
Demonstrates diagnostic capabilities similar to junior ophthalmologists.
Effective in clinical scenarios including multi-disease diagnosis and patient interaction.
Abstract
The need for improved diagnostic methods in ophthalmology is acute, especially in the underdeveloped regions with limited access to specialists and advanced equipment. Therefore, we introduce VisionUnite, a novel vision-language foundation model for ophthalmology enhanced with clinical knowledge. VisionUnite has been pretrained on an extensive dataset comprising 1.24 million image-text pairs, and further refined using our proposed MMFundus dataset, which includes 296,379 high-quality fundus image-text pairs and 889,137 simulated doctor-patient dialogue instances. Our experiments indicate that VisionUnite outperforms existing generative foundation models such as GPT-4V and Gemini Pro. It also demonstrates diagnostic capabilities comparable to junior ophthalmologists. VisionUnite performs well in various clinical scenarios including open-ended multi-disease diagnosis, clinical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOphthalmology and Visual Health Research
