MIM-OOD: Generative Masked Image Modelling for Out-of-Distribution Detection in Medical Images
Sergio Naval Marimont, Vasilis Siomos, Giacomo Tarroni

TL;DR
MIM-OOD introduces a fast, effective method for unsupervised out-of-distribution detection in medical images by replacing autoregressive models with specialized transformers for token identification and in-painting.
Contribution
The paper presents MIM-OOD, a novel approach using masked image modeling with transformers to improve speed and accuracy in OOD detection for medical images.
Findings
MIM-OOD outperforms autoregressive models in accuracy (DICE 0.458 vs 0.301)
MIM-OOD achieves a 25x speedup in inference time
Effective detection of brain MRI anomalies
Abstract
Unsupervised Out-of-Distribution (OOD) detection consists in identifying anomalous regions in images leveraging only models trained on images of healthy anatomy. An established approach is to tokenize images and model the distribution of tokens with Auto-Regressive (AR) models. AR models are used to 1) identify anomalous tokens and 2) in-paint anomalous representations with in-distribution tokens. However, AR models are slow at inference time and prone to error accumulation issues which negatively affect OOD detection performance. Our novel method, MIM-OOD, overcomes both speed and error accumulation issues by replacing the AR model with two task-specific networks: 1) a transformer optimized to identify anomalous tokens and 2) a transformer optimized to in-paint anomalous tokens using masked image modelling (MIM). Our experiments with brain MRI anomalies show that MIM-OOD substantially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning · Anomaly Detection Techniques and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
