Open-Vocabulary Object Detectors: Robustness Challenges under   Distribution Shifts

Prakash Chandra Chhipa; Kanjar De; Meenakshi Subhash Chippa; Rajkumar; Saini; Marcus Liwicki

arXiv:2405.14874·cs.CV·September 9, 2024

Open-Vocabulary Object Detectors: Robustness Challenges under Distribution Shifts

Prakash Chandra Chhipa, Kanjar De, Meenakshi Subhash Chippa, Rajkumar, Saini, Marcus Liwicki

PDF

Open Access

TL;DR

This paper evaluates the robustness of recent open-vocabulary object detection models under various distribution shifts, revealing significant challenges and guiding future research for more reliable vision systems.

Contribution

It provides a comprehensive robustness assessment of three leading open-vocabulary object detectors across multiple challenging benchmarks.

Findings

01

Models show decreased performance under distribution shifts.

02

Robustness varies significantly across different types of shifts.

03

Highlights need for improved robustness in open-vocabulary detection.

Abstract

The challenge of Out-Of-Distribution (OOD) robustness remains a critical hurdle towards deploying deep vision models. Vision-Language Models (VLMs) have recently achieved groundbreaking results. VLM-based open-vocabulary object detection extends the capabilities of traditional object detection frameworks, enabling the recognition and classification of objects beyond predefined categories. Investigating OOD robustness in recent open-vocabulary object detection is essential to increase the trustworthiness of these models. This study presents a comprehensive robustness evaluation of the zero-shot capabilities of three recent open-vocabulary (OV) foundation object detection models: OWL-ViT, YOLO World, and Grounding DINO. Experiments carried out on the robustness benchmarks COCO-O, COCO-DC, and COCO-C encompassing distribution shifts due to information loss, corruption, adversarial attacks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques

MethodsAttention Is All You Need · Softmax · Linear Layer · Residual Connection · Multi-Head Attention · Dense Connections · Layer Normalization · Vision Transformer · self-DIstillation with NO labels