Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge
Shuai Lu, Meng Wang, Jia Guo, Jiawei Du, Bo Liu, Shengzhu Yang, Weihang Zhang, Huazhu Fu, and Huiqi Li

TL;DR
This paper introduces EyExIn, a novel framework that enhances retinal vision language models with expert knowledge, improving medical reasoning accuracy and grounding visual evidence in ophthalmic diagnosis tasks.
Contribution
It proposes a Deep Expert Injection mechanism with dual-stream encoding and adaptive fusion to embed domain-specific knowledge into VLMs for ophthalmology.
Findings
Outperforms proprietary systems on four benchmarks
Achieves state-of-the-art ophthalmic visual question answering accuracy
Effectively anchors visual evidence to reduce hallucinations
Abstract
Large Vision Language Models (LVLMs) show immense potential for automated ophthalmic diagnosis. However, their clinical deployment is severely hindered by lacking domain-specific knowledge. In this work, we identify two structural deficiencies hindering reliable medical reasoning: 1) the Perception Gap, where general-purpose visual encoders fail to resolve fine-grained pathological cues (e.g., microaneurysms); and 2) the Reasoning Gap, where sparse visual evidence is progressively overridden by massive language priors in deeper transformer layers, leading to ungrounded hallucinations. To bridge these gaps, we propose EyExIn, a data-efficient framework designed to anchor retinal VLMs with expert knowledge via a Deep Expert Injection mechanism. Our architecture employs an Expert-Aware Dual-Stream encoding strategy that decouples visual representation into a general stream for anatomical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Retinal Imaging and Analysis · Domain Adaptation and Few-Shot Learning
