VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge

Zihan Li; Diping Song; Zefeng Yang; Deming Wang; Fei Li; Xiulan Zhang; Paul E. Kinahan; Yu Qiao

arXiv:2408.02865·eess.IV·August 13, 2025·2 cites

VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge

Zihan Li, Diping Song, Zefeng Yang, Deming Wang, Fei Li, Xiulan Zhang, Paul E. Kinahan, Yu Qiao

PDF

Open Access 1 Repo

TL;DR

VisionUnite is a comprehensive vision-language model for ophthalmology that integrates clinical knowledge, trained on large datasets, and demonstrates diagnostic and educational capabilities comparable to junior ophthalmologists.

Contribution

Introduction of VisionUnite, a novel ophthalmology-focused vision-language foundation model enhanced with clinical knowledge and trained on extensive datasets, outperforming existing models in diagnostics and education.

Findings

01

Outperforms GPT-4V and Gemini Pro in diagnostics.

02

Demonstrates diagnostic capabilities similar to junior ophthalmologists.

03

Effective in clinical scenarios including multi-disease diagnosis and patient interaction.

Abstract

The need for improved diagnostic methods in ophthalmology is acute, especially in the underdeveloped regions with limited access to specialists and advanced equipment. Therefore, we introduce VisionUnite, a novel vision-language foundation model for ophthalmology enhanced with clinical knowledge. VisionUnite has been pretrained on an extensive dataset comprising 1.24 million image-text pairs, and further refined using our proposed MMFundus dataset, which includes 296,379 high-quality fundus image-text pairs and 889,137 simulated doctor-patient dialogue instances. Our experiments indicate that VisionUnite outperforms existing generative foundation models such as GPT-4V and Gemini Pro. It also demonstrates diagnostic capabilities comparable to junior ophthalmologists. VisionUnite performs well in various clinical scenarios including open-ended multi-disease diagnosis, clinical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HUANGLIZI/VisionUnite
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOphthalmology and Visual Health Research