How to Evaluate the Generalization of Detection? A Benchmark for   Comprehensive Open-Vocabulary Detection

Yiyang Yao; Peng Liu; Tiancheng Zhao; Qianqian Zhang; Jiajia Liao,; Chunxin Fang; Kyusong Lee; Qing Wang

arXiv:2308.13177·cs.CV·December 19, 2023

How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

Yiyang Yao, Peng Liu, Tiancheng Zhao, Qianqian Zhang, Jiajia Liao,, Chunxin Fang, Kyusong Lee, Qing Wang

PDF

Open Access 4 Repos 1 Models 1 Datasets 1 Video

TL;DR

This paper introduces OVDEval, a comprehensive benchmark with 9 sub-tasks and a new evaluation metric NMS-AP to better assess the generalization and understanding capabilities of open-vocabulary object detection models.

Contribution

The paper presents a new benchmark dataset with fine-grained tasks and a novel evaluation metric addressing limitations of existing methods in open-vocabulary detection.

Findings

01

Existing top OVD models struggle on new fine-grained tasks.

02

NMS-AP provides more accurate evaluation than traditional AP.

03

Benchmark reveals weaknesses in current OVD models.

Abstract

Object detection (OD) in computer vision has made significant progress in recent years, transitioning from closed-set labels to open-vocabulary detection (OVD) based on large-scale vision-language pre-training (VLP). However, current evaluation methods and datasets are limited to testing generalization over object types and referral expressions, which do not provide a systematic, fine-grained, and accurate benchmark of OVD models' abilities. In this paper, we propose a new benchmark named OVDEval, which includes 9 sub-tasks and introduces evaluations on commonsense knowledge, attribute understanding, position understanding, object relation comprehension, and more. The dataset is meticulously created to provide hard negatives that challenge models' true understanding of visual and linguistic input. Additionally, we identify a problem with the popular Average Precision (AP) metric when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
omlab/omchat-v2.0-13B-single-beta_hf
model· 19 dl· ♡ 5
19 dl♡ 5

Datasets

omlab/OVDEval
dataset· 348 dl
348 dl

Videos

How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques