Inspiring the Next Generation of Segment Anything Models: Comprehensively Evaluate SAM and SAM 2 with Diverse Prompts Towards Context-Dependent Concepts under Different Scenes

Xiaoqi Zhao; Youwei Pang; Shijie Chang; Yuan Zhao; Lihe Zhang; Chenyang Yu; Hanqi Liu; Jiaming Zuo; Jinsong Ouyang; Weisi Lin; Georges El Fakhri; Huchuan Lu; Xiaofeng Liu

arXiv:2412.01240·cs.CV·August 27, 2025

Inspiring the Next Generation of Segment Anything Models: Comprehensively Evaluate SAM and SAM 2 with Diverse Prompts Towards Context-Dependent Concepts under Different Scenes

Xiaoqi Zhao, Youwei Pang, Shijie Chang, Yuan Zhao, Lihe Zhang, Chenyang Yu, Hanqi Liu, Jiaming Zuo, Jinsong Ouyang, Weisi Lin, Georges El Fakhri, Huchuan Lu, Xiaofeng Liu

PDF

Open Access 1 Repo

TL;DR

This paper thoroughly evaluates SAM and SAM 2 on diverse context-dependent concepts across multiple modalities, revealing their strengths and limitations in understanding complex visual contexts and guiding future segmentation model development.

Contribution

It introduces a comprehensive evaluation framework for SAM and SAM 2 on 11 context-dependent concepts across various scenes and modalities, including prompt strategies and robustness testing.

Findings

01

SAM and SAM 2 perform well on context-independent concepts.

02

Performance varies significantly on context-dependent concepts.

03

Prompt robustness impacts segmentation accuracy in real-world scenarios.

Abstract

As large-scale foundation models trained on billions of image--mask pairs covering a vast diversity of scenes, objects, and contexts, SAM and its upgraded version, SAM~2, have significantly influenced multiple fields within computer vision. Leveraging such unprecedented data diversity, they exhibit strong open-world segmentation capabilities, with SAM~2 further enhancing these capabilities to support high-quality video segmentation. While SAMs (SAM and SAM~2) have demonstrated excellent performance in segmenting context-independent concepts like people, cars, and roads, they overlook more challenging context-dependent (CD) concepts, such as visual saliency, camouflage, industrial defects, and medical lesions. CD concepts rely heavily on global and local contextual information, making them susceptible to shifts in different contexts, which requires strong discriminative capabilities from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Xiaoqi-Zhao-DLUT/GateNet-RGB-Saliency
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Software Engineering Techniques and Practices · Software Engineering Research

MethodsSegment Anything Model