Masked Image Modelling for retinal OCT understanding
Theodoros Pissas, Pablo M\'arquez-Neila, Sebastian Wolf, Martin, Zinkernagel, Raphael Sznitman

TL;DR
This paper demonstrates that masked autoencoders can effectively learn representations of retinal OCT images, improving performance on multiple tasks and enabling multimodal fusion with IR fundus images, using a large-scale dataset.
Contribution
It introduces the first extensive evaluation of masked image modelling for OCT, and extends MAE pretraining to multimodal fusion with IR fundus images for improved performance.
Findings
Strong performance on 6 downstream tasks after fine-tuning
Effective as a frozen feature extractor with lightweight adapters
Improved multimodal performance with joint OCT and IR model
Abstract
This work explores the effectiveness of masked image modelling for learning representations of retinal OCT images. To this end, we leverage Masked Autoencoders (MAE), a simple and scalable method for self-supervised learning, to obtain a powerful and general representation for OCT images by training on 700K OCT images from 41K patients collected under real world clinical settings. We also provide the first extensive evaluation for a model of OCT on a challenging battery of 6 downstream tasks. Our model achieves strong performance when fully finetuned but can also serve as a versatile frozen feature extractor for many tasks using lightweight adapters. Furthermore, we propose an extension of the MAE pretraining to fuse OCT with an auxiliary modality, namely, IR fundus images and learn a joint model for both. We demonstrate our approach improves performance on a multimodal downstream…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRetinal Imaging and Analysis
MethodsMasked autoencoder
