Exploration of visual prompt in Grounded pre-trained open-set detection
Qibo Chen, Weizhong Jin, Shuchang Li, Mengdi Liu, Li Yu, Jian Jiang,, Xiaozheng Wang

TL;DR
This paper introduces a novel visual prompt approach for open-set object detection that learns from few labeled images, improving generalization to new categories without manual prompt design.
Contribution
The paper proposes a statistical-based visual prompt construction method and task-specific similarity dictionaries to enhance open-set detection performance.
Findings
Outperforms existing prompt learning methods on ODinW dataset
More consistent in combinatorial inference
Effectively models new categories with few labeled images
Abstract
Text prompts are crucial for generalizing pre-trained open-set object detection models to new categories. However, current methods for text prompts are limited as they require manual feedback when generalizing to new categories, which restricts their ability to model complex scenes, often leading to incorrect detection results. To address this limitation, we propose a novel visual prompt method that learns new category knowledge from a few labeled images, which generalizes the pre-trained detection model to the new category. To allow visual prompts to represent new categories adequately, we propose a statistical-based prompt construction module that is not limited by predefined vocabulary lengths, thus allowing more vectors to be used when representing categories. We further utilize the category dictionaries in the pre-training dataset to design task-specific similarity dictionaries,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
