An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care
Zhi Da Soh, Yang Bai, Kai Yu, Yang Zhou, Xiaofeng Lei, Sahil Thakur, Zann Lee, Lee Ching Linette Phang, Qingsheng Peng, Can Can Xue, Rachel Shujuan Chong, Quan V. Hoang, Lavanya Raghavan, Yih Chung Tham, Charumathi Sabanayagam, Wei-Chi Wu, Ming-Chih Ho, Jiangnan He, Preeti Gupta

TL;DR
Meta-EyeFM is an integrated language-vision foundation model that combines large language and vision models to improve ocular disease diagnosis and triaging in primary eye care, achieving high accuracy and usability.
Contribution
The paper introduces Meta-EyeFM, a novel multi-function foundation model that integrates LLMs with VFMs for comprehensive ocular disease assessment and triaging.
Findings
Achieved 100% accuracy in routing fundus images to appropriate VFMs.
VFMs achieved ≥82.2% accuracy in disease detection.
Meta-EyeFM outperformed Gemini-1.5-flash and ChatGPT-4o in accuracy.
Abstract
Current deep learning models are mostly task specific and lack a user-friendly interface to operate. We present Meta-EyeFM, a multi-function foundation model that integrates a large language model (LLM) with vision foundation models (VFMs) for ocular disease assessment. Meta-EyeFM leverages a routing mechanism to enable accurate task-specific analysis based on text queries. Using Low Rank Adaptation, we fine-tuned our VFMs to detect ocular and systemic diseases, differentiate ocular disease severity, and identify common ocular signs. The model achieved 100% accuracy in routing fundus images to appropriate VFMs, which achieved 82.2% accuracy in disease detection, 89% in severity differentiation, 76% in sign identification. Meta-EyeFM was 11% to 43% more accurate than Gemini-1.5-flash and ChatGPT-4o LMMs in detecting various eye diseases and comparable to an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
