Bridging the Gap Between Object Detection and User Intent via Query-Modulation
Marco Fornoni, Chaochao Yan, Liangchen Luo, Kimberly Wilber, Alex, Stark, Yin Cui, Boqing Gong, Andrew Howard

TL;DR
This paper introduces query-modulated object detectors that incorporate user intent via query embeddings, significantly improving detection accuracy and versatility on mobile devices compared to traditional models.
Contribution
It presents a novel method to modulate mobile object detectors with user queries, enhancing detection performance and enabling joint localization and standard detection.
Findings
Query-modulated detectors outperform standard detectors on user query tasks.
They surpass specialized referring expression recognition systems in accuracy.
They can jointly localize user queries and perform standard detection, exceeding baseline models.
Abstract
When interacting with objects through cameras, or pictures, users often have a specific intent. For example, they may want to perform a visual search. With most object detection models relying on image pixels as their sole input, undesired results are not uncommon. Most typically: lack of a high-confidence detection on the object of interest, or detection with a wrong class label. The issue is especially severe when operating capacity-constrained mobile object detectors on-device. In this paper we investigate techniques to modulate mobile detectors to explicitly account for the user intent, expressed as an embedding of a simple query. Compared to standard detectors, query-modulated detectors show superior performance at detecting objects for a given user query. Thanks to large-scale training data synthesized from standard object detection annotations, query-modulated detectors also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications
