MultiRetNet: A Multimodal Vision Model and Deferral System for Staging Diabetic Retinopathy

Jeannie She; Katie Spivakovsky

arXiv:2507.14738·cs.CV·July 22, 2025

MultiRetNet: A Multimodal Vision Model and Deferral System for Staging Diabetic Retinopathy

Jeannie She, Katie Spivakovsky

PDF

TL;DR

MultiRetNet is a multimodal vision model that combines retinal images, socioeconomic, and health data with a deferral system to improve diabetic retinopathy staging, especially in underserved populations, by identifying cases needing clinician review.

Contribution

The paper introduces MultiRetNet, a novel multimodal pipeline with a clinical deferral system, integrating diverse data sources for improved DR staging and early detection in underserved groups.

Findings

01

Fusion through a fully connected layer is most effective.

02

Contrastive learning helps identify out-of-distribution samples.

03

System maintains accuracy on low-quality images and improves early detection.

Abstract

Diabetic retinopathy (DR) is a leading cause of preventable blindness, affecting over 100 million people worldwide. In the United States, individuals from lower-income communities face a higher risk of progressing to advanced stages before diagnosis, largely due to limited access to screening. Comorbid conditions further accelerate disease progression. We propose MultiRetNet, a novel pipeline combining retinal imaging, socioeconomic factors, and comorbidity profiles to improve DR staging accuracy, integrated with a clinical deferral system for a clinical human-in-the-loop implementation. We experiment with three multimodal fusion methods and identify fusion through a fully connected layer as the most versatile methodology. We synthesize adversarial, low-quality images and use contrastive learning to train the deferral system, guiding the model to identify out-of-distribution samples…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.