Multitask Multimodal Self-Supervised Learning for Medical Images

Cristian Simionescu

arXiv:2510.23325·cs.CV·October 28, 2025

Multitask Multimodal Self-Supervised Learning for Medical Images

Cristian Simionescu

PDF

TL;DR

This paper introduces Medformer, a novel multitask, self-supervised learning framework for medical images that reduces dependence on labeled data and handles diverse modalities and sizes effectively.

Contribution

It presents Medformer, an innovative neural network architecture for multitask learning and domain adaptation in medical imaging, with novel self-supervised pretext tasks and dynamic input-output mechanisms.

Findings

01

Medformer effectively pre-trains on diverse datasets.

02

Model generalizes well to various downstream tasks.

03

Reduces need for extensive labeled datasets.

Abstract

This thesis works to address a pivotal challenge in medical image analysis: the reliance on extensive labeled datasets, which are often limited due to the need for expert annotation and constrained by privacy and legal issues. By focusing on the development of self-supervised learning techniques and domain adaptation methods, this research aims to circumvent these limitations, presenting a novel approach to enhance the utility and efficacy of deep learning in medical imaging. Central to this thesis is the development of the Medformer, an innovative neural network architecture designed for multitask learning and deep domain adaptation. This model is adept at pre-training on diverse medical image datasets, handling varying sizes and modalities, and is equipped with a dynamic input-output adaptation mechanism. This enables efficient processing and integration of a wide range of medical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.