DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for   Open-world Detection

Lewei Yao; Jianhua Han; Youpeng Wen; Xiaodan Liang; Dan Xu; Wei Zhang,; Zhenguo Li; Chunjing Xu; Hang Xu

arXiv:2209.09407·cs.CV·October 18, 2022·64 cites

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection

Lewei Yao, Jianhua Han, Youpeng Wen, Xiaodan Liang, Dan Xu, Wei Zhang,, Zhenguo Li, Chunjing Xu, Hang Xu

PDF

Open Access 1 Video

TL;DR

DetCLIP introduces a knowledge-enriched, dictionary-based pre-training approach for open-world object detection, significantly improving zero-shot detection performance by leveraging concept descriptions and relationships.

Contribution

It proposes a novel parallel concept formulation and a comprehensive concept dictionary to enhance open-world detection and zero-shot learning capabilities.

Findings

01

DetCLIP-T outperforms GLIP-T by 9.9% mAP on LVIS.

02

Achieves 13.5% improvement on rare categories.

03

Demonstrates strong zero-shot detection performance.

Abstract

Open-world object detection, as a more general and challenging goal, aims to recognize and localize objects described by arbitrary category names. The recent work GLIP formulates this problem as a grounding problem by concatenating all category names of detection datasets into sentences, which leads to inefficient interaction between category names. This paper presents DetCLIP, a paralleled visual-concept pre-training method for open-world detection by resorting to knowledge enrichment from a designed concept dictionary. To achieve better learning efficiency, we propose a novel paralleled concept formulation that extracts concepts separately to better utilize heterogeneous datasets (i.e., detection, grounding, and image-text pairs) for training. We further design a concept dictionary~(with descriptions) from various online sources and detection datasets to provide prior knowledge for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling