MultiRetNet: A Multimodal Vision Model and Deferral System for Staging Diabetic Retinopathy
Jeannie She, Katie Spivakovsky

TL;DR
MultiRetNet is a multimodal vision model that combines retinal images, socioeconomic, and health data with a deferral system to improve diabetic retinopathy staging, especially in underserved populations, by identifying cases needing clinician review.
Contribution
The paper introduces MultiRetNet, a novel multimodal pipeline with a clinical deferral system, integrating diverse data sources for improved DR staging and early detection in underserved groups.
Findings
Fusion through a fully connected layer is most effective.
Contrastive learning helps identify out-of-distribution samples.
System maintains accuracy on low-quality images and improves early detection.
Abstract
Diabetic retinopathy (DR) is a leading cause of preventable blindness, affecting over 100 million people worldwide. In the United States, individuals from lower-income communities face a higher risk of progressing to advanced stages before diagnosis, largely due to limited access to screening. Comorbid conditions further accelerate disease progression. We propose MultiRetNet, a novel pipeline combining retinal imaging, socioeconomic factors, and comorbidity profiles to improve DR staging accuracy, integrated with a clinical deferral system for a clinical human-in-the-loop implementation. We experiment with three multimodal fusion methods and identify fusion through a fully connected layer as the most versatile methodology. We synthesize adversarial, low-quality images and use contrastive learning to train the deferral system, guiding the model to identify out-of-distribution samples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
