Multi-Modal Multi-Instance Learning for Retinal Disease Recognition

Xirong Li; Yang Zhou; Jie Wang; Hailan Lin; Jianchun Zhao; and Dayong Ding; Weihong Yu; Youxin Chen

arXiv:2109.12307·cs.CV·September 28, 2021

Multi-Modal Multi-Instance Learning for Retinal Disease Recognition

Xirong Li, Yang Zhou, Jie Wang, Hailan Lin, Jianchun Zhao, and Dayong Ding, Weihong Yu, Youxin Chen

PDF

TL;DR

This paper introduces a lightweight multi-modal deep learning approach for retinal disease recognition using fundus photos and OCT scans, effectively handling small datasets and improving interpretability.

Contribution

It proposes a novel Multi-Modal Multi-Instance Learning framework that fuses CFP and OCT data, with a pseudo sequence generation technique to enhance model performance and interpretability.

Findings

01

Effective multi-modal fusion for retinal disease recognition.

02

Model performs well on limited labeled data.

03

Improved interpretability through region relevance detection.

Abstract

This paper attacks an emerging challenge of multi-modal retinal disease recognition. Given a multi-modal case consisting of a color fundus photo (CFP) and an array of OCT B-scan images acquired during an eye examination, we aim to build a deep neural network that recognizes multiple vision-threatening diseases for the given case. As the diagnostic efficacy of CFP and OCT is disease-dependent, the network's ability of being both selective and interpretable is important. Moreover, as both data acquisition and manual labeling are extremely expensive in the medical domain, the network has to be relatively lightweight for learning from a limited set of labeled multi-modal samples. Prior art on retinal disease recognition focuses either on a single disease or on a single modality, leaving multi-modal fusion largely underexplored. We propose in this paper Multi-Modal Multi-Instance Learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Linear Layer