Detect an Object At Once without Fine-tuning
Junyu Hao, Jianheng Liu, Yongjia Zhao, Zuofan Chen, Qi Sun, Jinlong, Chen, Jianguo Wei, Minghao Yang

TL;DR
This paper presents a novel method for instantaneously detecting previously unseen objects in images without fine-tuning, using a similarity density map and a region alignment network trained on existing datasets.
Contribution
It introduces a two-phase approach combining SDM and RAN that enables zero-shot object detection without additional fine-tuning.
Findings
Outperforms state-of-the-art methods on MS COCO and PASCAL VOC datasets.
Effectively detects unseen objects without fine-tuning.
Utilizes a novel SDM-RAN architecture for accurate object localization.
Abstract
When presented with one or a few photos of a previously unseen object, humans can instantly recognize it in different scenes. Although the human brain mechanism behind this phenomenon is still not fully understood, this work introduces a novel technical realization of this task. It consists of two phases: (1) generating a Similarity Density Map (SDM) by convolving the scene image with the given object image patch(es) so that the highlight areas in the SDM indicate the possible locations; (2) obtaining the object occupied areas in the scene through a Region Alignment Network (RAN). The RAN is constructed on a backbone of Deep Siamese Network (DSN), and different from the traditional DSNs, it aims to obtain the object accurate regions by regressing the location and area differences between the ground truths and the predicted ones indicated by the highlight areas in SDM. By pre-learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsSiamese Network
