Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO
Fuseini Mumuni, Alhassan Mumuni

TL;DR
This paper investigates the use of Grounding DINO and SAM for automated image annotation across various domains, revealing predictable false positive patterns and demonstrating significant improvements in segmentation accuracy and efficiency.
Contribution
It provides empirical insights into false positive patterns in REC-based detection and demonstrates how size-based filtering improves segmentation in specialized domains.
Findings
False positives tend to be large and can be filtered by size.
Size-based filtering reduces false positives and improves segmentation accuracy.
SAM significantly enhances annotation efficiency and accuracy.
Abstract
Grounding DINO and the Segment Anything Model (SAM) have achieved impressive performance in zero-shot object detection and image segmentation, respectively. Together, they have a great potential to revolutionize applications in zero-shot semantic segmentation or data annotation. Yet, in specialized domains like medical image segmentation, objects of interest (e.g., organs, tissues, and tumors) may not fall in existing class names. To address this problem, the referring expression comprehension (REC) ability of Grounding DINO is leveraged to detect arbitrary targets by their language descriptions. However, recent studies have highlighted severe limitation of the REC framework in this application setting owing to its tendency to make false positive predictions when the target is absent in the given image. And, while this bottleneck is central to the prospect of open-set semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies
MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Dense Connections · Residual Connection · Multi-Head Attention · Vision Transformer · Segment Anything Model · self-DIstillation with NO labels
