Fine-Grained Zero-Shot Object Detection
Hongxu Ma, Chenbo Zhang, Lu Zhang, Jiaogen Zhou, Jihong Guan, Shuigeng Zhou

TL;DR
This paper introduces the novel task of fine-grained zero-shot object detection, develops a specialized method called MSHC, and creates a new benchmark dataset FGZSD-Birds to evaluate performance in distinguishing highly similar classes.
Contribution
The paper defines FG-ZSD, proposes the MSHC method with multi-level semantics-aware embedding, and constructs the first FG-ZSD dataset FGZSD-Birds for evaluation.
Findings
MSHC outperforms existing ZSD models on FGZSD-Birds.
FGZSD-Birds contains 148,820 images across 1432 species.
The method effectively distinguishes fine-grained classes with minimal differences.
Abstract
Zero-shot object detection (ZSD) aims to leverage semantic descriptions to localize and recognize objects of both seen and unseen classes. Existing ZSD works are mainly coarse-grained object detection, where the classes are visually quite different, thus are relatively easy to distinguish. However, in real life we often have to face fine-grained object detection scenarios, where the classes are too similar to be easily distinguished. For example, detecting different kinds of birds, fishes, and flowers. In this paper, we propose and solve a new problem called Fine-Grained Zero-Shot Object Detection (FG-ZSD for short), which aims to detect objects of different classes with minute differences in details under the ZSD paradigm. We develop an effective method called MSHC for the FG-ZSD task, which is based on an improved two-stage detector and employs a multi-level semantics-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Image and Object Detection Techniques · Advanced X-ray and CT Imaging
