SAM3-UNet: Simplified Adaptation of Segment Anything Model 3

Xinyu Xiong; Zihuang Wu; Lei Lu; Yufa Xia

arXiv:2512.01789·cs.CV·December 2, 2025

SAM3-UNet: Simplified Adaptation of Segment Anything Model 3

Xinyu Xiong, Zihuang Wu, Lei Lu, Yufa Xia

PDF

Open Access

TL;DR

SAM3-UNet is a simplified, efficient adaptation of the Segment Anything Model 3 that enhances downstream task performance with lower computational costs, demonstrated on tasks like mirror detection.

Contribution

Introduces SAM3-UNet, a lightweight and parameter-efficient variant of SAM3 tailored for downstream tasks, outperforming prior models with reduced resource requirements.

Findings

01

Outperforms SAM2-UNet and other state-of-the-art methods

02

Requires less than 6 GB GPU memory during training

03

Effective on tasks like mirror detection and salient object detection

Abstract

In this paper, we introduce SAM3-UNet, a simplified variant of Segment Anything Model 3 (SAM3), designed to adapt SAM3 for downstream tasks at a low cost. Our SAM3-UNet consists of three components: a SAM3 image encoder, a simple adapter for parameter-efficient fine-tuning, and a lightweight U-Net-style decoder. Preliminary experiments on multiple tasks, such as mirror detection and salient object detection, demonstrate that the proposed SAM3-UNet outperforms the prior SAM2-UNet and other state-of-the-art methods, while requiring less than 6 GB of GPU memory during training with a batch size of 12. The code is publicly available at https://github.com/WZH0120/SAM3-UNet.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques