CP-DETR: Concept Prompt Guide DETR Toward Stronger Universal Object   Detection

Qibo Chen; Weizhong Jin; Jianyue Ge; Mengdi Liu; Yuchao Yan; Jian; Jiang; Li Yu; Xuanjiang Guo; Shuchang Li; Jianzhong Chen

arXiv:2412.09799·cs.CV·December 16, 2024

CP-DETR: Concept Prompt Guide DETR Toward Stronger Universal Object Detection

Qibo Chen, Weizhong Jin, Jianyue Ge, Mengdi Liu, Yuchao Yan, Jian, Jiang, Li Yu, Xuanjiang Guo, Shuchang Li, Jianzhong Chen

PDF

1 Video

TL;DR

CP-DETR introduces a universal object detection model that effectively utilizes concept prompts and hybrid encoding to improve zero-shot and few-shot detection across diverse scenarios.

Contribution

The paper proposes a novel prompt-guided hybrid encoder and concept prompt generation methods, enhancing universal detection performance with a single pre-trained model.

Findings

01

Achieves 47.6 zero-shot AP on LVIS with Swin-T backbone.

02

Attains 68.4 AP on COCO val with visual prompts.

03

Reaches 73.1 fully-shot AP on ODinW13 with optimized prompts.

Abstract

Recent research on universal object detection aims to introduce language in a SoTA closed-set detector and then generalize the open-set concepts by constructing large-scale (text-region) datasets for training. However, these methods face two main challenges: (i) how to efficiently use the prior information in the prompts to genericise objects and (ii) how to reduce alignment bias in the downstream tasks, both leading to sub-optimal performance in some scenarios beyond pre-training. To address these challenges, we propose a strong universal detection foundation model called CP-DETR, which is competitive in almost all scenarios, with only one pre-training weight. Specifically, we design an efficient prompt visual hybrid encoder that enhances the information interaction between prompt and visual through scale-by-scale and multi-scale fusion modules. Then, the hybrid encoder is facilitated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CP-DETR: Concept Prompt Guide DETR Toward Stronger Universal Object Detection· underline