An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care

Zhi Da Soh; Yang Bai; Kai Yu; Yang Zhou; Xiaofeng Lei; Sahil Thakur; Zann Lee; Lee Ching Linette Phang; Qingsheng Peng; Can Can Xue; Rachel Shujuan Chong; Quan V. Hoang; Lavanya Raghavan; Yih Chung Tham; Charumathi Sabanayagam; Wei-Chi Wu; Ming-Chih Ho; Jiangnan He; Preeti Gupta; Ecosse Lamoureux; Seang Mei Saw; Vinay Nangia; Songhomitra Panda-Jonas; Jie Xu; Ya Xing Wang; Xinxing Xu; Jost B. Jonas; Tien Yin Wong; Rick Siow Mong Goh; Yong Liu; Ching-Yu Cheng

arXiv:2505.08414·eess.IV·May 14, 2025

An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care

Zhi Da Soh, Yang Bai, Kai Yu, Yang Zhou, Xiaofeng Lei, Sahil Thakur, Zann Lee, Lee Ching Linette Phang, Qingsheng Peng, Can Can Xue, Rachel Shujuan Chong, Quan V. Hoang, Lavanya Raghavan, Yih Chung Tham, Charumathi Sabanayagam, Wei-Chi Wu, Ming-Chih Ho, Jiangnan He, Preeti Gupta

PDF

TL;DR

Meta-EyeFM is an integrated language-vision foundation model that combines large language and vision models to improve ocular disease diagnosis and triaging in primary eye care, achieving high accuracy and usability.

Contribution

The paper introduces Meta-EyeFM, a novel multi-function foundation model that integrates LLMs with VFMs for comprehensive ocular disease assessment and triaging.

Findings

01

Achieved 100% accuracy in routing fundus images to appropriate VFMs.

02

VFMs achieved ≥82.2% accuracy in disease detection.

03

Meta-EyeFM outperformed Gemini-1.5-flash and ChatGPT-4o in accuracy.

Abstract

Current deep learning models are mostly task specific and lack a user-friendly interface to operate. We present Meta-EyeFM, a multi-function foundation model that integrates a large language model (LLM) with vision foundation models (VFMs) for ocular disease assessment. Meta-EyeFM leverages a routing mechanism to enable accurate task-specific analysis based on text queries. Using Low Rank Adaptation, we fine-tuned our VFMs to detect ocular and systemic diseases, differentiate ocular disease severity, and identify common ocular signs. The model achieved 100% accuracy in routing fundus images to appropriate VFMs, which achieved $\geq$ 82.2% accuracy in disease detection, $\geq$ 89% in severity differentiation, $\geq$ 76% in sign identification. Meta-EyeFM was 11% to 43% more accurate than Gemini-1.5-flash and ChatGPT-4o LMMs in detecting various eye diseases and comparable to an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.