Multimodal Self-Supervised Learning for Medical Image Analysis
Aiham Taleb, Christoph Lippert, Tassilo Klein, and Moin Nabi

TL;DR
This paper introduces a novel multimodal self-supervised learning method using a puzzle task and synthetic data augmentation to improve medical image analysis across various tasks and modalities.
Contribution
It proposes a multimodal puzzle task with permutation inference and synthetic data augmentation for enhanced self-supervised learning in medical imaging.
Findings
Improved semantic representations over independent modality treatment.
Enhanced downstream task performance and data efficiency.
Effective use of synthetic images for pretraining.
Abstract
Self-supervised learning approaches leverage unlabeled samples to acquire generic knowledge about different concepts, hence allowing for annotation-efficient downstream task learning. In this paper, we propose a novel self-supervised method that leverages multiple imaging modalities. We introduce the multimodal puzzle task, which facilitates rich representation learning from multiple image modalities. The learned representations allow for subsequent fine-tuning on different downstream tasks. To achieve that, we learn a modality-agnostic feature embedding by confusing image modalities at the data-level. Together with the Sinkhorn operator, with which we formulate the puzzle solving optimization as permutation matrix inference instead of classification, they allow for efficient solving of multimodal puzzles with varying levels of complexity. In addition, we also propose to utilize…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
