SAM2-UNeXT: An Improved High-Resolution Baseline for Adapting Foundation Models to Downstream Segmentation Tasks

Xinyu Xiong; Zihuang Wu; Lei Zhang; Lei Lu; Ming Li; Guanbin Li

arXiv:2508.03566·cs.CV·August 6, 2025

SAM2-UNeXT: An Improved High-Resolution Baseline for Adapting Foundation Models to Downstream Segmentation Tasks

Xinyu Xiong, Zihuang Wu, Lei Zhang, Lei Lu, Ming Li, Guanbin Li

PDF

TL;DR

SAM2-UNeXT enhances foundation model adaptation for high-resolution segmentation tasks by integrating a DINOv2 encoder and dual-resolution strategy, achieving superior results across multiple benchmarks with a simple architecture.

Contribution

It introduces SAM2-UNeXT, a novel framework that extends SAM2 with a DINOv2 encoder and dual-resolution approach for improved segmentation performance.

Findings

01

Outperforms existing methods on four benchmarks

02

Achieves higher accuracy with simpler architecture

03

Demonstrates versatility across diverse segmentation tasks

Abstract

Recent studies have highlighted the potential of adapting the Segment Anything Model (SAM) for various downstream tasks. However, constructing a more powerful and generalizable encoder to further enhance performance remains an open challenge. In this work, we propose SAM2-UNeXT, an advanced framework that builds upon the core principles of SAM2-UNet while extending the representational capacity of SAM2 through the integration of an auxiliary DINOv2 encoder. By incorporating a dual-resolution strategy and a dense glue layer, our approach enables more accurate segmentation with a simple architecture, relaxing the need for complex decoder designs. Extensive experiments conducted on four benchmarks, including dichotomous image segmentation, camouflaged object detection, marine animal segmentation, and remote sensing saliency detection, demonstrate the superior performance of our proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.