TL;DR
SAM2-UNet leverages the powerful SAM2 encoder within a U-shaped framework, achieving superior performance across diverse image segmentation tasks without complex modifications.
Contribution
This work introduces SAM2-UNet, a versatile segmentation framework that effectively utilizes SAM2 as an encoder with parameter-efficient fine-tuning, outperforming existing methods.
Findings
Outperforms state-of-the-art methods on multiple segmentation tasks
Demonstrates the effectiveness of SAM2 as a strong encoder
Shows versatility across natural and medical image segmentation
Abstract
Image segmentation plays an important role in vision understanding. Recently, the emerging vision foundation models continuously achieved superior performance on various tasks. Following such success, in this paper, we prove that the Segment Anything Model 2 (SAM2) can be a strong encoder for U-shaped segmentation models. We propose a simple but effective framework, termed SAM2-UNet, for versatile image segmentation. Specifically, SAM2-UNet adopts the Hiera backbone of SAM2 as the encoder, while the decoder uses the classic U-shaped design. Additionally, adapters are inserted into the encoder to allow parameter-efficient fine-tuning. Preliminary experiments on various downstream tasks, such as camouflaged object detection, salient object detection, marine animal segmentation, mirror detection, and polyp segmentation, demonstrate that our SAM2-UNet can simply beat existing specialized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
