YouNiverse: Large-Scale Channel and Video Metadata from English-Speaking YouTube
Manoel Horta Ribeiro, Robert West

TL;DR
YouNiverse is a comprehensive, large-scale dataset of YouTube channel and video metadata, including time-series data and user comments, enabling extensive research into the platform's dynamics and content.
Contribution
This paper introduces YouNiverse, a novel, extensive dataset of YouTube metadata that addresses sampling and querying limitations for research purposes.
Findings
Dataset includes over 136k channels and 72.9M videos.
Contains detailed time-series data on subscribers and views.
Links comments from 449M users to videos.
Abstract
YouTube plays a key role in entertaining and informing people around the globe. However, studying the platform is difficult due to the lack of randomly sampled data and of systematic ways to query the platform's colossal catalog. In this paper, we present YouNiverse, a large collection of channel and video metadata from English-language YouTube. YouNiverse comprises metadata from over 136k channels and 72.9M videos published between May 2005 and October 2019, as well as channel-level time-series data with weekly subscriber and view counts. Leveraging channel ranks from socialblade.com, an online service that provides information about YouTube, we are able to assess and enhance the representativeness of the sample of channels. Additionally, the dataset also contains a table specifying which videos a set of 449M anonymous users commented on. YouNiverse, publicly available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedia Influence and Politics · Misinformation and Its Impacts · Media Studies and Communication
