SAM3-UNet: Simplified Adaptation of Segment Anything Model 3
Xinyu Xiong, Zihuang Wu, Lei Lu, Yufa Xia

TL;DR
SAM3-UNet is a simplified, efficient adaptation of the Segment Anything Model 3 that enhances downstream task performance with lower computational costs, demonstrated on tasks like mirror detection.
Contribution
Introduces SAM3-UNet, a lightweight and parameter-efficient variant of SAM3 tailored for downstream tasks, outperforming prior models with reduced resource requirements.
Findings
Outperforms SAM2-UNet and other state-of-the-art methods
Requires less than 6 GB GPU memory during training
Effective on tasks like mirror detection and salient object detection
Abstract
In this paper, we introduce SAM3-UNet, a simplified variant of Segment Anything Model 3 (SAM3), designed to adapt SAM3 for downstream tasks at a low cost. Our SAM3-UNet consists of three components: a SAM3 image encoder, a simple adapter for parameter-efficient fine-tuning, and a lightweight U-Net-style decoder. Preliminary experiments on multiple tasks, such as mirror detection and salient object detection, demonstrate that the proposed SAM3-UNet outperforms the prior SAM2-UNet and other state-of-the-art methods, while requiring less than 6 GB of GPU memory during training with a batch size of 12. The code is publicly available at https://github.com/WZH0120/SAM3-UNet.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques
