Toward Multimodal Conversational AI for Age-Related Macular Degeneration

Ran Gu; Benjamin Hou; M\'elanie H\'ebert; Asmita Indurkar; Yifan Yang; Emily Y. Chew; Tiarn\'an D. L. Keenan; Zhiyong Lu

arXiv:2604.25720·cs.CV·April 29, 2026

Toward Multimodal Conversational AI for Age-Related Macular Degeneration

Ran Gu, Benjamin Hou, M\'elanie H\'ebert, Asmita Indurkar, Yifan Yang, Emily Y. Chew, Tiarn\'an D. L. Keenan, Zhiyong Lu

PDF

TL;DR

This study introduces OcularChat, a multimodal large language model fine-tuned for diagnosing age-related macular degeneration using retinal images and dialogue, demonstrating high accuracy, interpretability, and clinical relevance.

Contribution

The paper presents OcularChat, the first MLLM tailored for AMD diagnosis that combines visual question answering with clinical reasoning and interactive dialogue capabilities.

Findings

01

OcularChat achieved over 95% accuracy in AMD classification tasks.

02

It outperformed existing models in objective and subjective evaluations.

03

OcularChat provides clinically meaningful explanations and reasoning.

Abstract

Despite strong performance of deep learning models in retinal disease detection, most systems produce static predictions without clinical reasoning or interactive explanation. Recent advances in multimodal large language models (MLLMs) integrate diagnostic predictions with clinically meaningful dialogue to support clinical decision-making and patient counseling. In this study, OcularChat, an MLLM, was fine-tuned from Qwen2.5-VL using simulated patient-physician dialogues to diagnose age-related macular degeneration (AMD) through visual question answering on color fundus photographs (CFPs). A total of 705,850 simulated dialogues paired with 46,167 CFPs were generated to train OcularChat to identify key AMD features and produce reasoned predictions. OcularChat demonstrated strong classification performance in AREDS, achieving accuracies of 0.954, 0.849, and 0.678 for the three diagnostic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.