Cross-Modal Semantic-Enhanced Diffusion Framework for Diabetic Retinopathy Grading
Yiqun Wang (Beijing Jiaotong University)

TL;DR
This paper introduces a novel diffusion-based framework for diabetic retinopathy grading that leverages vision-language models and semantic conditioning to improve accuracy and robustness.
Contribution
It proposes a CLIP-guided semantic diffusion model with domain adaptation and cross-modal semantic conditioning for enhanced DR grading performance.
Findings
Achieved 87.5% accuracy on APTOS 2019 dataset.
Outperformed existing diffusion-based and visual-only methods.
Validated the effectiveness of each module through ablation studies.
Abstract
Automated grading of diabetic retinopathy (DR) faces several critical challenges: subtle inter-grade visual distinctions in fine-grained lesion patterns, distributional discrepancies induced by heterogeneous imaging devices and acquisition conditions, and the inherent inability of purely visual approaches to exploit clinical semantic knowledge. In this paper, we propose CLIP-Guided Semantic Diffusion (CGSD), a DR grading framework that synergistically integrates vision-language pretraining with diffusion probabilistic modeling. We adopt a domain-specific vision-language model tailored for DR grading as the semantic guidance module and adapt it to the target domain via Low-Rank Adaptation (LoRA), effectively bridging the distributional gap between the pretrained model and the target dataset with only a minimal number of trainable parameters. Building on this foundation, we construct a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
