Insights on the V3C2 Dataset
Luca Rossetto, Klaus Schoeffmann, Abraham Bernstein

TL;DR
This paper provides insights into the V3C2 video dataset, highlighting its potential for research in video retrieval and offering data to facilitate its use, thereby supporting standardized experimentation.
Contribution
It offers a detailed analysis of the V3C2 dataset, including extracted data to aid researchers and discusses its implications for video retrieval research.
Findings
V3C2 contains approximately 3800 hours of video content.
The dataset is representative of web video content.
All extracted data is provided to simplify dataset usage.
Abstract
For research results to be comparable, it is important to have common datasets for experimentation and evaluation. The size of such datasets, however, can be an obstacle to their use. The Vimeo Creative Commons Collection (V3C) is a video dataset designed to be representative of video content found on the web, containing roughly 3800 hours of video in total, split into three shards. In this paper, we present insights on the second of these shards (V3C2) and discuss their implications for research areas, such as video retrieval, for which the dataset might be particularly useful. We also provide all the extracted data in order to simplify the use of the dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Video Analysis and Summarization
