Detect Only What You Specify : Object Detection with Linguistic Target
Moyuru Yamada

TL;DR
This paper introduces a language-guided object detection approach that detects only specified objects based on natural language input, enabling more targeted and context-aware detection in images.
Contribution
The paper proposes the Language-Targeted Detector (LTD), a Transformer-based model that incorporates linguistic context for selective object detection, addressing limitations of traditional detectors.
Findings
LTD improves detection accuracy when guided by textual input.
The model effectively grounds language to visual objects in the COCO dataset.
LTD demonstrates the ability to detect only specified objects based on language cues.
Abstract
Object detection is a computer vision task of predicting a set of bounding boxes and category labels for each object of interest in a given image. The category is related to a linguistic symbol such as 'dog' or 'person' and there should be relationships among them. However the object detector only learns to classify the categories and does not treat them as the linguistic symbols. Multi-modal models often use the pre-trained object detector to extract object features from the image, but the models are separated from the detector and the extracted visual features does not change with their linguistic input. We rethink the object detection as a vision-and-language reasoning task. We then propose targeted detection task, where detection targets are given by a natural language and the goal of the task is to detect only all the target objects in a given image. There are no detection if the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
