TL;DR
This paper introduces a zero-shot object detection method that combines semantic attributes with visual features, enabling the detection of unseen objects while maintaining efficiency on seen classes.
Contribution
It presents a novel end-to-end model that fuses semantic attributes with visual features, improving unseen object detection without relying on semantic info at test time.
Findings
Significant improvement in average precision for unseen classes on PASCAL VOC.
Maintains YOLOv2 efficiency for seen classes.
Effective detection of unseen objects through semantic-visual fusion.
Abstract
As we move towards large-scale object detection, it is unrealistic to expect annotated training data, in the form of bounding box annotations around objects, for all object classes at sufficient scale, and so methods capable of unseen object detection are required. We propose a novel zero-shot method based on training an end-to-end model that fuses semantic attribute prediction with visual features to propose object bounding boxes for seen and unseen classes. While we utilize semantic features during training, our method is agnostic to semantic information for unseen classes at test-time. Our method retains the efficiency and effectiveness of YOLOv2 for objects seen during training, while improving its performance for novel and unseen objects. The ability of state-of-art detection methods to learn discriminative object features to reject background proposals also limits their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAverage Pooling · Global Average Pooling · 1x1 Convolution · Batch Normalization · Max Pooling · Softmax · Convolution · Darknet-19 · YOLOv2
