Training-Free Open-Ended Object Detection and Segmentation via Attention   as Prompts

Zhiwei Lin; Yongtao Wang; Zhi Tang

arXiv:2410.05963·cs.CV·October 10, 2024·2 cites

Training-Free Open-Ended Object Detection and Segmentation via Attention as Prompts

Zhiwei Lin, Yongtao Wang, Zhi Tang

PDF

Open Access 1 Video

TL;DR

VL-SAM is a training-free framework that combines vision-language and segmentation models to detect and segment unseen objects in open-world scenarios without requiring object category inputs.

Contribution

This paper introduces VL-SAM, a novel training-free approach that leverages attention maps from pre-trained models for open-ended object detection and segmentation.

Findings

01

Outperforms previous open-ended detection methods on LVIS dataset

02

Provides additional instance segmentation masks without training

03

Demonstrates strong generalization across different models and datasets

Abstract

Existing perception models achieve great success by learning from large amounts of labeled data, but they still struggle with open-world scenarios. To alleviate this issue, researchers introduce open-set perception tasks to detect or segment unseen objects in the training set. However, these models require predefined object categories as inputs during inference, which are not available in real-world scenarios. Recently, researchers pose a new and more practical problem, \textit{i.e.}, open-ended object detection, which discovers unseen objects without any object categories as inputs. In this paper, we present VL-SAM, a training-free framework that combines the generalized object recognition model (\textit{i.e.,} Vision-Language Model) with the generalized object localization model (\textit{i.e.,} Segment-Anything Model), to address the open-ended object detection and segmentation task.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Training-Free Open-Ended Object Detection and Segmentation via Attention as Prompts· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Image and Object Detection Techniques

MethodsSoftmax · Attention Is All You Need · Segment Anything Model