Open-Vocabulary Object Detection via Language Hierarchy
Jiaxing Huang, Jingyi Zhang, Kai Jiang, Shijian Lu

TL;DR
This paper introduces Language Hierarchical Self-training (LHST), a novel approach that incorporates language hierarchy into weakly-supervised object detection to improve generalization across multiple datasets.
Contribution
It presents LHST, which uses language hierarchy for label expansion and co-regularization, enhancing weakly-supervised detection and bridging vocabulary gaps.
Findings
Achieves superior generalization on 14 datasets.
Effectively mitigates image-to-box label mismatch.
Improves detection accuracy with language hierarchy integration.
Abstract
Recent studies on generalizable object detection have attracted increasing attention with additional weak supervision from large-scale datasets with image-level labels. However, weakly-supervised detection learning often suffers from image-to-box label mismatch, i.e., image-level labels do not convey precise object information. We design Language Hierarchical Self-training (LHST) that introduces language hierarchy into weakly-supervised detector training for learning more generalizable detectors. LHST expands the image-level labels with language hierarchy and enables co-regularization between the expanded labels and self-training. Specifically, the expanded labels regularize self-training by providing richer supervision and mitigating the image-to-box label mismatch, while self-training allows assessing and selecting the expanded labels according to the predicted reliability. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsSoftmax · Attention Is All You Need
