SAM2-UNeXT: An Improved High-Resolution Baseline for Adapting Foundation Models to Downstream Segmentation Tasks
Xinyu Xiong, Zihuang Wu, Lei Zhang, Lei Lu, Ming Li, Guanbin Li

TL;DR
SAM2-UNeXT enhances foundation model adaptation for high-resolution segmentation tasks by integrating a DINOv2 encoder and dual-resolution strategy, achieving superior results across multiple benchmarks with a simple architecture.
Contribution
It introduces SAM2-UNeXT, a novel framework that extends SAM2 with a DINOv2 encoder and dual-resolution approach for improved segmentation performance.
Findings
Outperforms existing methods on four benchmarks
Achieves higher accuracy with simpler architecture
Demonstrates versatility across diverse segmentation tasks
Abstract
Recent studies have highlighted the potential of adapting the Segment Anything Model (SAM) for various downstream tasks. However, constructing a more powerful and generalizable encoder to further enhance performance remains an open challenge. In this work, we propose SAM2-UNeXT, an advanced framework that builds upon the core principles of SAM2-UNet while extending the representational capacity of SAM2 through the integration of an auxiliary DINOv2 encoder. By incorporating a dual-resolution strategy and a dense glue layer, our approach enables more accurate segmentation with a simple architecture, relaxing the need for complex decoder designs. Extensive experiments conducted on four benchmarks, including dichotomous image segmentation, camouflaged object detection, marine animal segmentation, and remote sensing saliency detection, demonstrate the superior performance of our proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
