OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and   Open-World Unknown Objects Supervision

Junjie Wang; Bin Chen; Bin Kang; Yulin Li; YiChi Chen; Weizhi Xian,; Huifeng Chang; Yong Xu

arXiv:2405.17913·cs.CV·August 22, 2024·1 cites

OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision

Junjie Wang, Bin Chen, Bin Kang, Yulin Li, YiChi Chen, Weizhi Xian,, Huifeng Chang, Yong Xu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces OV-DQUO, a novel open-vocabulary detection method that uses denoising text query training and unknown object supervision to improve detection of novel categories, achieving state-of-the-art results.

Contribution

The paper proposes a wildcard matching method and a denoising text query training strategy to enhance open-vocabulary detection, addressing confidence bias and background confusion issues.

Findings

01

Achieved 45.6 AP50 on OV-COCO benchmark.

02

Achieved 39.3 mAP on OV-LVIS benchmark.

03

Outperformed previous methods without extra training data.

Abstract

Open-vocabulary detection aims to detect objects from novel categories beyond the base categories on which the detector is trained. However, existing open-vocabulary detectors trained on base category data tend to assign higher confidence to trained categories and confuse novel categories with the background. To resolve this, we propose OV-DQUO, an \textbf{O}pen-\textbf{V}ocabulary DETR with \textbf{D}enoising text \textbf{Q}uery training and open-world \textbf{U}nknown \textbf{O}bjects supervision. Specifically, we introduce a wildcard matching method. This method enables the detector to learn from pairs of unknown objects recognized by the open-world detector and text embeddings with general semantics, mitigating the confidence bias between base and novel categories. Additionally, we propose a denoising text query training strategy. It synthesizes foreground and background query-box…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiaomoguhz/ov-dquo
pytorchOfficial

Videos

OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections