A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering
Chaoning Zhang, Joseph Cho, Fachrina Dewi Puspitasari, Sheng Zheng,, Chenghao Li, Yu Qiao, Taegoo Kang, Xinru Shan, Chenshuang Zhang, Caiyan Qin,, Francois Rameau, Lik-Hang Lee, Sung-Ho Bae, Choong Seon Hong

TL;DR
This survey reviews the development, capabilities, and limitations of the Segment Anything Model (SAM) family, emphasizing its impact on image and video segmentation and outlining future research directions.
Contribution
It provides a comprehensive overview of SAM and SAM 2, analyzing their advancements, applications, and challenges, and suggests future research avenues in segmentation technology.
Findings
SAM demonstrates versatility across various applications.
Challenges remain in high granularity and prompt-less scenarios.
Future directions include domain-specific adaptations and improved memory mechanisms.
Abstract
The Segment Anything Model (SAM), developed by Meta AI Research, represents a significant breakthrough in computer vision, offering a robust framework for image and video segmentation. This survey provides a comprehensive exploration of the SAM family, including SAM and SAM 2, highlighting their advancements in granularity and contextual understanding. Our study demonstrates SAM's versatility across a wide range of applications while identifying areas where improvements are needed, particularly in scenarios requiring high granularity and in the absence of explicit prompts. By mapping the evolution and capabilities of SAM models, we offer insights into their strengths and limitations and suggest future research directions, including domain-specific adaptations and enhanced memory and propagation mechanisms. We believe that this survey comprehensively covers the breadth of SAM's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection
MethodsMulti-Head Attention · Attention Is All You Need · Segment Anything Model · Softmax · Layer Normalization · Linear Layer · Dense Connections · Residual Connection · Vision Transformer · Diffusion
