Adapting Foundation Models for Annotation-Efficient Adnexal Mass Segmentation in Cine Images

Francesca Fati; Alberto Rota; Adriana V. Gregory; Anna Catozzo; Maria C. Giuliano; Mrinal Dhar; Luigi De Vitis; Annie T. Packard; Francesco Multinu; Elena De Momi; Carrie L. Langstraat; Timothy L. Kline

arXiv:2604.08045·cs.CV·April 10, 2026

Adapting Foundation Models for Annotation-Efficient Adnexal Mass Segmentation in Cine Images

Francesca Fati, Alberto Rota, Adriana V. Gregory, Anna Catozzo, Maria C. Giuliano, Mrinal Dhar, Luigi De Vitis, Annie T. Packard, Francesco Multinu, Elena De Momi, Carrie L. Langstraat, Timothy L. Kline

PDF

1 Repo

TL;DR

This paper introduces a data-efficient, foundation model-based segmentation framework for adnexal mass ultrasound images, achieving state-of-the-art results with limited labeled data.

Contribution

It leverages a pretrained DINOv3 vision transformer backbone with a novel decoder to improve segmentation accuracy and efficiency in data-scarce medical imaging scenarios.

Findings

01

Achieves a Dice score of 0.945 on clinical ultrasound data.

02

Reduces Hausdorff Distance by 11.4% compared to convolutional baselines.

03

Maintains strong performance with only 25% of training data.

Abstract

Adnexal mass evaluation via ultrasound is a challenging clinical task, often hindered by subjective interpretation and significant inter-observer variability. While automated segmentation is a foundational step for quantitative risk assessment, traditional fully supervised convolutional architectures frequently require large amounts of pixel-level annotations and struggle with domain shifts common in medical imaging. In this work, we propose a label-efficient segmentation framework that leverages the robust semantic priors of a pretrained DINOv3 foundational vision transformer backbone. By integrating this backbone with a Dense Prediction Transformer (DPT)-style decoder, our model hierarchically reassembles multi-scale features to combine global semantic representations with fine-grained spatial details. Evaluated on a clinical dataset of 7,777 annotated frames from 112 patients, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

FrancescaFati/MESA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.