Exploring Hierarchical Consistency and Unbiased Objectness for Open-Vocabulary Object Detection

Sanghoon Lee; Geon Lee; Hyekang Park; Bumsub Ham

arXiv:2604.23344·cs.CV·April 28, 2026

Exploring Hierarchical Consistency and Unbiased Objectness for Open-Vocabulary Object Detection

Sanghoon Lee, Geon Lee, Hyekang Park, Bumsub Ham

PDF

1 Repo

TL;DR

This paper introduces a hierarchical confidence calibration method and LoCLIP, a parameter-efficient CLIP adaptation, to improve open-vocabulary object detection by enhancing class label reliability and objectness scoring.

Contribution

It proposes a novel pseudo labeling framework with hierarchical confidence calibration and LoCLIP, addressing label accuracy and objectness score issues in open-vocabulary detection.

Findings

01

Achieves state-of-the-art results on COCO and LVIS benchmarks.

02

Improves class label reliability through hierarchical semantic consistency.

03

Enhances objectness estimation for novel classes with LoCLIP.

Abstract

Conventional object detectors typically operate under a closed-set assumption, limiting recognition to a predefined set of base classes seen during training. Open-vocabulary object detection (OVD) addresses this limitation by leveraging vision-language models (VLMs) to generate pseudo labels for novel object classes. However, existing OVD methods suffer from two critical drawbacks: (1) inaccurate class label assignments, as VLMs are optimized for image-level predictions rather than the region-level predictions required for pseudo labeling, and (2) unreliable objectness scores from region proposal networks (RPNs) trained exclusively on base object classes. To address these issues, we propose a novel pseudo labeling framework for OVD. Our approach introduces a hierarchical confidence calibration (HCC) technique, which ensures reliable class label estimation by assessing consistency across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://cvlab.yonsei.ac.kr/projects/HCC
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.