YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set   for Object Detection in Video

Esteban Real; Jonathon Shlens; Stefano Mazzocchi; Xin Pan; Vincent; Vanhoucke

arXiv:1702.00824·cs.CV·March 28, 2017·46 cites

YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video

Esteban Real, Jonathon Shlens, Stefano Mazzocchi, Xin Pan, Vincent, Vanhoucke

PDF

Open Access

TL;DR

This paper introduces YouTube-BoundingBoxes, a large-scale, high-precision video dataset with dense object annotations, aiming to advance research in video object detection and tracking.

Contribution

The creation of a large, high-quality, human-annotated video dataset with dense bounding boxes and classification labels for object detection research.

Findings

01

Dataset contains approximately 380,000 video segments.

02

High annotation accuracy above 95%.

03

Baseline results for deep network architectures.

Abstract

We introduce a new large-scale data set of video URLs with densely-sampled object bounding box annotations called YouTube-BoundingBoxes (YT-BB). The data set consists of approximately 380,000 video segments about 19s long, automatically selected to feature objects in natural settings without editing or post-processing, with a recording quality often akin to that of a hand-held cell phone camera. The objects represent a subset of the MS COCO label set. All video segments were human-annotated with high-precision classification labels and bounding boxes at 1 frame per second. The use of a cascade of increasingly precise human annotations ensures a label accuracy above 95% for every class and tight bounding boxes. Finally, we train and evaluate well-known deep network architectures and report baseline figures for per-frame classification and localization to provide a point of comparison for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications