USF-MAE: Ultrasound Self-Supervised Foundation Model with Masked Autoencoding
Youssef Megahed, Robin Ducharme, Aylin Erman, Mark Walker, Steven Hawken, and Adrian D. C. Chan

TL;DR
USF-MAE introduces a large-scale self-supervised ultrasound model trained with masked autoencoding, significantly improving ultrasound image classification accuracy across multiple clinical benchmarks without requiring labeled pretraining data.
Contribution
This work is the first to develop a large-scale self-supervised MAE framework exclusively for ultrasound data, leveraging 370,000 images from diverse sources to learn modality-specific representations.
Findings
Outperforms CNN and ViT baselines on three benchmarks.
Approaches supervised model performance without using labels during pretraining.
Demonstrates strong cross-anatomical generalization.
Abstract
Ultrasound imaging is one of the most widely used diagnostic modalities, offering real-time, radiation-free assessment across diverse clinical domains. However, interpretation of ultrasound images remains challenging due to high noise levels, operator dependence, and limited field of view, resulting in substantial inter-observer variability. Current Deep Learning approaches are hindered by the scarcity of large labeled datasets and the domain gap between general and sonographic images, which limits the transferability of models pretrained on non-medical data. To address these challenges, we introduce the Ultrasound Self-Supervised Foundation Model with Masked Autoencoding (USF-MAE), the first large-scale self-supervised MAE framework pretrained exclusively on ultrasound data. The model was pre-trained on 370,000 2D and 3D ultrasound images curated from 46 open-source datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
