TL;DR
This paper presents a new approach for localizing multiple object instances in natural images using sketch queries, addressing challenges like sketch variability and domain gap, and demonstrating strong results on standard benchmarks.
Contribution
Introduces a novel cross-modal attention framework for sketch-guided object localization, capable of handling abstract sketches, unseen categories, and multiple instances.
Findings
Outperforms baselines on MS-COCO and PASCAL-VOC
Effective with a single sketch query
Generalizes to unseen object categories
Abstract
We introduce the novel problem of localizing all the instances of an object (seen or unseen during training) in a natural image via sketch query. We refer to this problem as sketch-guided object localization. This problem is distinctively different from the traditional sketch-based image retrieval task where the gallery set often contains images with only one object. The sketch-guided object localization proves to be more challenging when we consider the following: (i) the sketches used as queries are abstract representations with little information on the shape and salient attributes of the object, (ii) the sketches have significant variability as they are hand-drawn by a diverse set of untrained human subjects, and (iii) there exists a domain gap between sketch queries and target natural images as these are sampled from very different data distributions. To address the problem of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
