Insights on the V3C2 Dataset

Luca Rossetto; Klaus Schoeffmann; Abraham Bernstein

arXiv:2105.01475·cs.MM·May 5, 2021

Insights on the V3C2 Dataset

Luca Rossetto, Klaus Schoeffmann, Abraham Bernstein

PDF

Open Access 1 Repo

TL;DR

This paper provides insights into the V3C2 video dataset, highlighting its potential for research in video retrieval and offering data to facilitate its use, thereby supporting standardized experimentation.

Contribution

It offers a detailed analysis of the V3C2 dataset, including extracted data to aid researchers and discusses its implications for video retrieval research.

Findings

01

V3C2 contains approximately 3800 hours of video content.

02

The dataset is representative of web video content.

03

All extracted data is provided to simplify dataset usage.

Abstract

For research results to be comparable, it is important to have common datasets for experimentation and evaluation. The size of such datasets, however, can be an obstacle to their use. The Vimeo Creative Commons Collection (V3C) is a video dataset designed to be representative of video content found on the web, containing roughly 3800 hours of video in total, split into three shards. In this paper, we present insights on the second of these shards (V3C2) and discuss their implications for research areas, such as video retrieval, for which the dataset might be particularly useful. We also provide all the extracted data in order to simplify the use of the dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lucaro/V3C2Analysis
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Video Analysis and Summarization