Multi-Modal Multi-Instance Learning for Retinal Disease Recognition
Xirong Li, Yang Zhou, Jie Wang, Hailan Lin, Jianchun Zhao, and Dayong Ding, Weihong Yu, Youxin Chen

TL;DR
This paper introduces a lightweight multi-modal deep learning approach for retinal disease recognition using fundus photos and OCT scans, effectively handling small datasets and improving interpretability.
Contribution
It proposes a novel Multi-Modal Multi-Instance Learning framework that fuses CFP and OCT data, with a pseudo sequence generation technique to enhance model performance and interpretability.
Findings
Effective multi-modal fusion for retinal disease recognition.
Model performs well on limited labeled data.
Improved interpretability through region relevance detection.
Abstract
This paper attacks an emerging challenge of multi-modal retinal disease recognition. Given a multi-modal case consisting of a color fundus photo (CFP) and an array of OCT B-scan images acquired during an eye examination, we aim to build a deep neural network that recognizes multiple vision-threatening diseases for the given case. As the diagnostic efficacy of CFP and OCT is disease-dependent, the network's ability of being both selective and interpretable is important. Moreover, as both data acquisition and manual labeling are extremely expensive in the medical domain, the network has to be relatively lightweight for learning from a limited set of labeled multi-modal samples. Prior art on retinal disease recognition focuses either on a single disease or on a single modality, leaving multi-modal fusion largely underexplored. We propose in this paper Multi-Modal Multi-Instance Learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Linear Layer
