TL;DR
This paper introduces the Zero-Shot Detection (ZSD) problem, enabling models to recognize and localize unseen object categories in complex scenes without training examples, using a novel end-to-end deep network and experimental protocol.
Contribution
It proposes the first end-to-end deep network for ZSD that models visual and semantic interplay, along with a new protocol and loss function to handle semantic noise.
Findings
Significant performance improvements over baseline methods.
Effective modeling of visual-semantic interplay in ZSD.
Validated on a challenging ILSVRC dataset.
Abstract
Current Zero-Shot Learning (ZSL) approaches are restricted to recognition of a single dominant unseen object category in a test image. We hypothesize that this setting is ill-suited for real-world applications where unseen objects appear only as a part of a complex scene, warranting both the `recognition' and `localization' of an unseen category. To address this limitation, we introduce a new \emph{`Zero-Shot Detection'} (ZSD) problem setting, which aims at simultaneously recognizing and locating object instances belonging to novel categories without any training examples. We also propose a new experimental protocol for ZSD based on the highly challenging ILSVRC dataset, adhering to practical issues, e.g., the rarity of unseen objects. To the best of our knowledge, this is the first end-to-end deep network for ZSD that jointly models the interplay between visual and semantic domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
