Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge

Shuai Lu; Meng Wang; Jia Guo; Jiawei Du; Bo Liu; Shengzhu Yang; Weihang Zhang; Huazhu Fu; and Huiqi Li

arXiv:2603.07131·cs.CV·March 20, 2026

Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge

Shuai Lu, Meng Wang, Jia Guo, Jiawei Du, Bo Liu, Shengzhu Yang, Weihang Zhang, Huazhu Fu, and Huiqi Li

PDF

Open Access

TL;DR

This paper introduces EyExIn, a novel framework that enhances retinal vision language models with expert knowledge, improving medical reasoning accuracy and grounding visual evidence in ophthalmic diagnosis tasks.

Contribution

It proposes a Deep Expert Injection mechanism with dual-stream encoding and adaptive fusion to embed domain-specific knowledge into VLMs for ophthalmology.

Findings

01

Outperforms proprietary systems on four benchmarks

02

Achieves state-of-the-art ophthalmic visual question answering accuracy

03

Effectively anchors visual evidence to reduce hallucinations

Abstract

Large Vision Language Models (LVLMs) show immense potential for automated ophthalmic diagnosis. However, their clinical deployment is severely hindered by lacking domain-specific knowledge. In this work, we identify two structural deficiencies hindering reliable medical reasoning: 1) the Perception Gap, where general-purpose visual encoders fail to resolve fine-grained pathological cues (e.g., microaneurysms); and 2) the Reasoning Gap, where sparse visual evidence is progressively overridden by massive language priors in deeper transformer layers, leading to ungrounded hallucinations. To bridge these gaps, we propose EyExIn, a data-efficient framework designed to anchor retinal VLMs with expert knowledge via a Deep Expert Injection mechanism. Our architecture employs an Expert-Aware Dual-Stream encoding strategy that decouples visual representation into a general stream for anatomical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Retinal Imaging and Analysis · Domain Adaptation and Few-Shot Learning