Discriminative Learning of Open-Vocabulary Object Retrieval and Localization by Negative Phrase Augmentation
Ryota Hinami, Shin'ichi Satoh

TL;DR
This paper introduces Query-Adaptive R-CNN and negative phrase augmentation to enable open-vocabulary object retrieval and localization from large image collections, significantly expanding the capabilities of traditional detectors.
Contribution
It presents a novel extension of Faster R-CNN for open-vocabulary queries and a negative phrase augmentation technique for discriminative training.
Findings
Retrieves and localizes objects from one million images in 0.5 seconds
Achieves high precision in open-vocabulary object localization
Effectively mines hard negative samples for improved training
Abstract
Thanks to the success of object detection technology, we can retrieve objects of the specified classes even from huge image collections. However, the current state-of-the-art object detectors (such as Faster R-CNN) can only handle pre-specified classes. In addition, large amounts of positive and negative visual samples are required for training. In this paper, we address the problem of open-vocabulary object retrieval and localization, where the target object is specified by a textual query (e.g., a word or phrase). We first propose Query-Adaptive R-CNN, a simple extension of Faster R-CNN adapted to open-vocabulary queries, by transforming the text embedding vector into an object classifier and localization regressor. Then, for discriminative training, we then propose negative phrase augmentation (NPA) to mine hard negative samples which are visually similar to the query and at the same…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
MethodsRegion Proposal Network · Softmax · Convolution · RoIPool · Faster R-CNN
