Open-Vocabulary Object Detection via Language Hierarchy

Jiaxing Huang; Jingyi Zhang; Kai Jiang; Shijian Lu

arXiv:2410.20371·cs.CV·October 29, 2024

Open-Vocabulary Object Detection via Language Hierarchy

Jiaxing Huang, Jingyi Zhang, Kai Jiang, Shijian Lu

PDF

Open Access

TL;DR

This paper introduces Language Hierarchical Self-training (LHST), a novel approach that incorporates language hierarchy into weakly-supervised object detection to improve generalization across multiple datasets.

Contribution

It presents LHST, which uses language hierarchy for label expansion and co-regularization, enhancing weakly-supervised detection and bridging vocabulary gaps.

Findings

01

Achieves superior generalization on 14 datasets.

02

Effectively mitigates image-to-box label mismatch.

03

Improves detection accuracy with language hierarchy integration.

Abstract

Recent studies on generalizable object detection have attracted increasing attention with additional weak supervision from large-scale datasets with image-level labels. However, weakly-supervised detection learning often suffers from image-to-box label mismatch, i.e., image-level labels do not convey precise object information. We design Language Hierarchical Self-training (LHST) that introduces language hierarchy into weakly-supervised detector training for learning more generalizable detectors. LHST expands the image-level labels with language hierarchy and enables co-regularization between the expanded labels and self-training. Specifically, the expanded labels regularize self-training by providing richer supervision and mitigating the image-to-box label mismatch, while self-training allows assessing and selecting the expanded labels according to the predicted reliability. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsSoftmax · Attention Is All You Need