Multimodal Query-guided Object Localization
Aditay Tripathi, Rajath R Dani, Anand Mishra, Anirban Chakraborty

TL;DR
This paper introduces a multimodal approach for open-set object localization using hand-drawn sketches and textual descriptions, employing novel attention and scoring techniques to improve localization accuracy despite domain gaps.
Contribution
It proposes a new cross-modal attention scheme and orthogonal projection-based scoring method for effective multimodal object localization in open-set scenarios.
Findings
Improved localization accuracy over baseline methods.
Effective handling of domain gaps between sketches, text, and images.
Demonstrated robustness in open-set object localization tasks.
Abstract
Consider a scenario in one-shot query-guided object localization where neither an image of the object nor the object category name is available as a query. In such a scenario, a hand-drawn sketch of the object could be a choice for a query. However, hand-drawn crude sketches alone, when used as queries, might be ambiguous for object localization, e.g., a sketch of a laptop could be confused for a sofa. On the other hand, a linguistic definition of the category, e.g., a small portable computer small enough to use in your lap" along with the sketch query, gives better visual and semantic cues for object localization. In this work, we present a multimodal query-guided object localization approach under the challenging open-set setting. In particular, we use queries from two modalities, namely, hand-drawn sketch and description of the object (also known as gloss), to perform object…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection
