# Overcoming data scarcity in biomedical imaging with a foundational multi-task model

**Authors:** Raphael Schäfer, Till Nicke, Henning Höfener, Annkristin Lange, Dorit Merhof, Friedrich Feuerhake, Volkmar Schulz, Johannes Lotz, Fabian Kiessling

PMC · DOI: 10.1038/s43588-024-00662-z · 2024-07-19

## TL;DR

This paper introduces UMedPT, a foundational model for biomedical imaging that achieves high performance with limited training data across various tasks and datasets.

## Contribution

The novel multi-task learning strategy decouples training tasks from memory requirements, enabling effective training with less data.

## Key findings

- UMedPT outperformed ImageNet pretraining and state-of-the-art models in biomedical imaging tasks.
- It maintained performance with only 1% of training data for in-domain tasks and 50% for out-of-domain tasks.
- UMedPT demonstrated superior cross-center transferability in an external validation.

## Abstract

Foundational models, pretrained on a large scale, have demonstrated substantial success across non-medical domains. However, training these models typically requires large, comprehensive datasets, which contrasts with the smaller and more specialized datasets common in biomedical imaging. Here we propose a multi-task learning strategy that decouples the number of training tasks from memory requirements. We trained a universal biomedical pretrained model (UMedPT) on a multi-task database including tomographic, microscopic and X-ray images, with various labeling strategies such as classification, segmentation and object detection. The UMedPT foundational model outperformed ImageNet pretraining and previous state-of-the-art models. For classification tasks related to the pretraining database, it maintained its performance with only 1% of the original training data and without fine-tuning. For out-of-domain tasks it required only 50% of the original training data. In an external independent validation, imaging features extracted using UMedPT proved to set a new standard for cross-center transferability.

UMedPT, a foundational model for biomedical imaging, has been trained on a variety of medical tasks with different types of label. It has achieved high performance with less training data in various clinical applications.

## Full-text entities

- **Genes:** AGAP3 (ArfGAP with GTPase domain, ankyrin repeat and PH domain 3) [NCBI Gene 116988] {aka AGAP-3, CENTG3, CRAG, MRIP-1, cnt-g3}
- **Diseases:** meningioma (MESH:D008579), phylloides tumor (MESH:D003557), mucinous carcinoma (MESH:D002288), UMedPT (MESH:C563594), Breast cancer (MESH:D001943), Pneumonia (MESH:D011014), ductal carcinoma (MESH:D044584), invasive cancers (MESH:D009362), in situ carcinomas (MESH:D002278), Cancer (MESH:D009369), tubular adenoma (MESH:D000236), lung nodule (MESH:D003074), Benign lesions (MESH:D001932), CRC (MESH:D015179), CNS neoplasms (MESH:D016543), prostate and colon cancer (MESH:D011471), invasive carcinoma (MESH:D009361), colorectal adenocarcinoma (MESH:D003110), glioma (MESH:D005910), papillary carcinoma (MESH:D002291), Tuberculosis (MESH:D014376), lobular carcinoma (MESH:D018275), polyp (MESH:D011127), adenosis (MESH:D005348), fibroadenoma (MESH:D018226), pituitary tumor (MESH:D010911)
- **Chemicals:** FCOS (-), hematoxylin (MESH:D006416)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11288886/full.md

---
Source: https://tomesphere.com/paper/PMC11288886