The Open Images Dataset V4: Unified image classification, object   detection, and visual relationship detection at scale

Alina Kuznetsova; Hassan Rom; Neil Alldrin; Jasper Uijlings; Ivan; Krasin; Jordi Pont-Tuset; Shahab Kamali; Stefan Popov; Matteo Malloci,; Alexander Kolesnikov; Tom Duerig; Vittorio Ferrari

arXiv:1811.00982·cs.CV·March 26, 2020·613 cites

The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale

Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan, Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci,, Alexander Kolesnikov, Tom Duerig, Vittorio Ferrari

PDF

Open Access 1 Repo 5 Datasets

TL;DR

Open Images V4 is a large-scale, richly annotated dataset that supports multiple computer vision tasks including classification, detection, and relationship understanding, enabling advanced research and model development.

Contribution

The paper introduces Open Images V4, a comprehensive dataset with unified annotations for multiple tasks, significantly larger and more diverse than previous datasets, facilitating multi-task learning and structured reasoning.

Findings

01

Dataset contains 9.2 million images with extensive annotations.

02

Provides 15 times more bounding boxes than previous datasets.

03

Supports multiple tasks with unified annotations, enabling new research avenues.

Abstract

We present Open Images V4, a dataset of 9.2M images with unified annotations for image classification, object detection and visual relationship detection. The images have a Creative Commons Attribution license that allows to share and adapt the material, and they have been collected from Flickr without a predefined list of class names or tags, leading to natural class statistics and avoiding an initial design bias. Open Images V4 offers large scale across several dimensions: 30.1M image-level labels for 19.8k concepts, 15.4M bounding boxes for 600 object classes, and 375k visual relationship annotations involving 57 classes. For object detection in particular, we provide 15x more bounding boxes than the next largest datasets (15.4M boxes on 1.9M images). The images often show complex scenes with several objects (8 annotated objects per image on average). We annotated visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ccc013/DeepLearning_Notes
tf

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques