Open-vocabulary Attribute Detection
Mar\'ia A. Bravo, Sudhanshu Mittal, Simon Ging, Thomas Brox

TL;DR
This paper introduces the OVAD benchmark for open-vocabulary object attribute detection, providing a large, annotated dataset and baseline methods to evaluate vision-language models' ability to recognize object attributes in a zero-shot setting.
Contribution
The paper presents the first benchmark and dataset for open-vocabulary object attribute detection, enabling systematic evaluation of models' attribute recognition capabilities.
Findings
Baseline methods show limited attribute detection performance.
The benchmark reveals strengths and weaknesses of current vision-language models.
Open-vocabulary attribute detection remains a challenging task.
Abstract
Vision-language modeling has enabled open-vocabulary tasks where predictions can be queried using any text prompt in a zero-shot manner. Existing open-vocabulary tasks focus on object classes, whereas research on object attributes is limited due to the lack of a reliable attribute-focused evaluation benchmark. This paper introduces the Open-Vocabulary Attribute Detection (OVAD) task and the corresponding OVAD benchmark. The objective of the novel task and benchmark is to probe object-level attribute information learned by vision-language models. To this end, we created a clean and densely annotated test set covering 117 attribute classes on the 80 object classes of MS COCO. It includes positive and negative annotations, which enables open-vocabulary evaluation. Overall, the benchmark consists of 1.4 million annotations. For reference, we provide a first baseline method for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
MethodsTest
