Masked Image Modelling for retinal OCT understanding

Theodoros Pissas; Pablo M\'arquez-Neila; Sebastian Wolf; Martin; Zinkernagel; Raphael Sznitman

arXiv:2405.14788·cs.CV·May 24, 2024

Masked Image Modelling for retinal OCT understanding

Theodoros Pissas, Pablo M\'arquez-Neila, Sebastian Wolf, Martin, Zinkernagel, Raphael Sznitman

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that masked autoencoders can effectively learn representations of retinal OCT images, improving performance on multiple tasks and enabling multimodal fusion with IR fundus images, using a large-scale dataset.

Contribution

It introduces the first extensive evaluation of masked image modelling for OCT, and extends MAE pretraining to multimodal fusion with IR fundus images for improved performance.

Findings

01

Strong performance on 6 downstream tasks after fine-tuning

02

Effective as a frozen feature extractor with lightweight adapters

03

Improved multimodal performance with joint OCT and IR model

Abstract

This work explores the effectiveness of masked image modelling for learning representations of retinal OCT images. To this end, we leverage Masked Autoencoders (MAE), a simple and scalable method for self-supervised learning, to obtain a powerful and general representation for OCT images by training on 700K OCT images from 41K patients collected under real world clinical settings. We also provide the first extensive evaluation for a model of OCT on a challenging battery of 6 downstream tasks. Our model achieves strong performance when fully finetuned but can also serve as a versatile frozen feature extractor for many tasks using lightweight adapters. Furthermore, we propose an extension of the MAE pretraining to fuse OCT with an auxiliary modality, namely, IR fundus images and learn a joint model for both. We demonstrate our approach improves performance on a multimodal downstream…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

theopis/mim_oct
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRetinal Imaging and Analysis

MethodsMasked autoencoder