Adapting Foundation Models for Annotation-Efficient Adnexal Mass Segmentation in Cine Images
Francesca Fati, Alberto Rota, Adriana V. Gregory, Anna Catozzo, Maria C. Giuliano, Mrinal Dhar, Luigi De Vitis, Annie T. Packard, Francesco Multinu, Elena De Momi, Carrie L. Langstraat, Timothy L. Kline

TL;DR
This paper introduces a data-efficient, foundation model-based segmentation framework for adnexal mass ultrasound images, achieving state-of-the-art results with limited labeled data.
Contribution
It leverages a pretrained DINOv3 vision transformer backbone with a novel decoder to improve segmentation accuracy and efficiency in data-scarce medical imaging scenarios.
Findings
Achieves a Dice score of 0.945 on clinical ultrasound data.
Reduces Hausdorff Distance by 11.4% compared to convolutional baselines.
Maintains strong performance with only 25% of training data.
Abstract
Adnexal mass evaluation via ultrasound is a challenging clinical task, often hindered by subjective interpretation and significant inter-observer variability. While automated segmentation is a foundational step for quantitative risk assessment, traditional fully supervised convolutional architectures frequently require large amounts of pixel-level annotations and struggle with domain shifts common in medical imaging. In this work, we propose a label-efficient segmentation framework that leverages the robust semantic priors of a pretrained DINOv3 foundational vision transformer backbone. By integrating this backbone with a Dense Prediction Transformer (DPT)-style decoder, our model hierarchically reassembles multi-scale features to combine global semantic representations with fine-grained spatial details. Evaluated on a clinical dataset of 7,777 annotated frames from 112 patients, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
