On the Potential of Open-Vocabulary Models for Object Detection in Unusual Street Scenes
Sadia Ilyas, Ido Freeman, Matthias Rottmann

TL;DR
This paper evaluates the effectiveness of open-vocabulary object detection models in identifying unusual and out-of-distribution objects in street scenes, highlighting their potential and current limitations for real-world applications.
Contribution
The study benchmarks four state-of-the-art open-vocabulary object detectors across three datasets, revealing their strengths and shortcomings in challenging street scene scenarios.
Findings
Grounding DINO achieves top AP of 48.3% on RoadObstacle21.
YOLO-World achieves 21.2% AP on RoadAnomaly21.
Open-vocabulary models show promise but require improvements for reliable deployment.
Abstract
Out-of-distribution (OOD) object detection is a critical task focused on detecting objects that originate from a data distribution different from that of the training data. In this study, we investigate to what extent state-of-the-art open-vocabulary object detectors can detect unusual objects in street scenes, which are considered as OOD or rare scenarios with respect to common street scene datasets. Specifically, we evaluate their performance on the OoDIS Benchmark, which extends RoadAnomaly21 and RoadObstacle21 from SegmentMeIfYouCan, as well as LostAndFound, which was recently extended to object level annotations. The objective of our study is to uncover short-comings of contemporary object detectors in challenging real-world, and particularly in open-world scenarios. Our experiments reveal that open vocabulary models are promising for OOD object detection scenarios, however far…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
MethodsSoftmax · Linear Layer · Residual Connection · Multi-Head Attention · Layer Normalization · Attention Is All You Need · Dense Connections · Vision Transformer · self-DIstillation with NO labels
