Bridge the Points: Graph-based Few-shot Segment Anything Semantically
Anqi Zhang, Guangyu Gao, Jianbo Jiao, Chi Harold Liu, and Yunchao Wei

TL;DR
This paper introduces a graph-based method to improve few-shot semantic segmentation with SAM, enhancing prompt selection, reducing hyperparameters, and increasing efficiency, resulting in state-of-the-art performance on multiple datasets.
Contribution
The paper proposes a novel graph analysis approach with modules for prompt selection and mask clustering, significantly improving efficiency and accuracy in few-shot segmentation tasks.
Findings
Achieves 58.7% mIoU on COCO-20i dataset
Outperforms existing models in efficiency and accuracy
Effective in cross-domain and one-shot segmentation scenarios
Abstract
The recent advancements in large-scale pre-training techniques have significantly enhanced the capabilities of vision foundation models, notably the Segment Anything Model (SAM), which can generate precise masks based on point and box prompts. Recent studies extend SAM to Few-shot Semantic Segmentation (FSS), focusing on prompt generation for SAM-based automatic semantic segmentation. However, these methods struggle with selecting suitable prompts, require specific hyperparameter settings for different scenarios, and experience prolonged one-shot inference times due to the overuse of SAM, resulting in low efficiency and limited automation ability. To address these issues, we propose a simple yet effective approach based on graph analysis. In particular, a Positive-Negative Alignment module dynamically selects the point prompts for generating masks, especially uncovering the potential of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Video Analysis and Summarization
MethodsSoftmax · Dense Connections · Layer Normalization · Linear Layer · Multi-Head Attention · Residual Connection · Attention Is All You Need · Vision Transformer · self-DIstillation with NO labels · Segment Anything Model
