SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection

Mingxuan Liu; Tyler L. Hayes; Elisa Ricci; Gabriela Csurka; Riccardo; Volpi

arXiv:2405.10053·cs.CV·May 17, 2024

SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection

Mingxuan Liu, Tyler L. Hayes, Elisa Ricci, Gabriela Csurka, Riccardo, Volpi

PDF

Open Access 1 Repo

TL;DR

SHiNe introduces a hierarchy-aware classifier for open-vocabulary object detection that leverages class hierarchies to improve robustness and accuracy across diverse vocabularies without additional inference costs.

Contribution

It proposes a novel, training-free hierarchy-based classifier that enhances open-vocabulary detection robustness by integrating semantic hierarchy knowledge into existing detectors.

Findings

01

Up to +31.9% mAP50 improvement with ground truth hierarchies.

02

Improves CLIP zero-shot accuracy by +2.8% on ImageNet-1k.

03

Seamlessly integrates with existing detectors without extra inference overhead.

Abstract

Open-vocabulary object detection (OvOD) has transformed detection into a language-guided task, empowering users to freely define their class vocabularies of interest during inference. However, our initial investigation indicates that existing OvOD detectors exhibit significant variability when dealing with vocabularies across various semantic granularities, posing a concern for real-world deployment. To this end, we introduce Semantic Hierarchy Nexus (SHiNe), a novel classifier that uses semantic knowledge from class hierarchies. It runs offline in three steps: i) it retrieves relevant super-/sub-categories from a hierarchy for each target class; ii) it integrates these categories into hierarchy-aware sentences; iii) it fuses these sentence embeddings to generate the nexus classifier vector. Our evaluation on various detection benchmarks demonstrates that SHiNe enhances robustness…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

naver/shine
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsContrastive Language-Image Pre-training