Semantic-aware SAM for Point-Prompted Instance Segmentation
Zhaoyang Wei, Pengfei Chen, Xuehui Yu, Guorong Li, Jianbin Jiao,, Zhenjun Han

TL;DR
This paper presents SAPNet, a novel semantic-aware segmentation network that enhances SAM's class-specific segmentation using point prompts, MIL, and strategic proposals, achieving promising results on Pascal VOC and COCO.
Contribution
Introduces SAPNet, integrating MIL and point prompts with SAM for cost-effective, category-specific instance segmentation with novel strategies to address weak supervision challenges.
Findings
SAPNet outperforms baseline methods on Pascal VOC and COCO datasets.
The proposed strategies effectively mitigate 'group' and 'local' issues in weakly supervised segmentation.
Semantic matching capabilities are significantly improved with the new framework.
Abstract
Single-point annotation in visual tasks, with the goal of minimizing labelling costs, is becoming increasingly prominent in research. Recently, visual foundation models, such as Segment Anything (SAM), have gained widespread usage due to their robust zero-shot capabilities and exceptional annotation performance. However, SAM's class-agnostic output and high confidence in local segmentation introduce 'semantic ambiguity', posing a challenge for precise category-specific segmentation. In this paper, we introduce a cost-effective category-specific segmenter using SAM. To tackle this challenge, we have devised a Semantic-Aware Instance Segmentation Network (SAPNet) that integrates Multiple Instance Learning (MIL) with matching capability and SAM with point prompts. SAPNet strategically selects the most representative mask proposals generated by SAM to supervise segmentation, with a specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Advanced Neural Network Applications
MethodsFocus · Segment Anything Model
