CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
Chuofan Ma, Yi Jiang, Xin Wen, Zehuan Yuan, Xiaojuan Qi

TL;DR
CoDet introduces a co-occurrence based method for aligning image regions with words, improving open-vocabulary object detection by discovering shared objects through visual similarities and co-occurrence in image groups.
Contribution
The paper presents CoDet, a novel approach that reformulates region-word alignment as a co-occurring object discovery problem, avoiding reliance on pre-aligned vision-language spaces.
Findings
Achieves 37.0 AP^m_{novel} on OV-LVIS, surpassing previous SOTA by 4.2 points.
Demonstrates superior scalability and performance in open-vocabulary detection.
Effectively discovers shared objects through co-occurrence and visual similarity.
Abstract
Deriving reliable region-word alignment from image-text pairs is critical to learn object-level vision-language representations for open-vocabulary object detection. Existing methods typically rely on pre-trained or self-trained vision-language models for alignment, which are prone to limitations in localization accuracy or generalization capabilities. In this paper, we propose CoDet, a novel approach that overcomes the reliance on pre-aligned vision-language space by reformulating region-word alignment as a co-occurring object discovery problem. Intuitively, by grouping images that mention a shared concept in their captions, objects corresponding to the shared concept shall exhibit high co-occurrence among the group. CoDet then leverages visual similarities to discover the co-occurring objects and align them with the shared concept. Extensive experiments demonstrate that CoDet has…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
MethodsALIGN
