Web Video in Numbers - An Analysis of Web-Video Metadata
Luca Rossetto, Heiko Schuldt

TL;DR
This paper analyzes metadata from over 120 million web videos across Vimeo and YouTube to understand their properties and compare them with existing video collections, revealing discrepancies in representativeness.
Contribution
It provides a large-scale analysis of web video metadata and highlights the gap between real-world web videos and curated collections.
Findings
Existing collections do not accurately reflect web video properties.
Web videos exhibit diverse metadata characteristics.
Significant differences found between web videos and curated datasets.
Abstract
Web video is often used as a source of data in various fields of study. While specialized subsets of web video, mainly earmarked for dedicated purposes, are often analyzed in detail, there is little information available about the properties of web video as a whole. In this paper we present insights gained from the analysis of the metadata associated with more than 120 million videos harvested from two popular web video platforms, vimeo and YouTube, in 2016 and compare their properties with the ones found in commonly used video collections. This comparison has revealed that existing collections do not (or no longer) properly reflect the properties of web video "in the wild".
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Web Data Mining and Analysis · Caching and Content Delivery
