TubeCensus: A Transparent, Replicable, and Large-Scale Census of YouTube Channels and their Subscriber Counts Over Time
Chloe Eggleston, Abram Handler, Maria Leonor Pacheco

TL;DR
TubeCensus provides a transparent, large-scale, longitudinal dataset of YouTube creators and subscriber counts over nearly two decades, enabling research without relying on the platform's API.
Contribution
It introduces a novel, replicable method for constructing a comprehensive YouTube creator dataset from Internet Archive captures, bypassing API limitations.
Findings
TubeCensus covers at least 30-36% of all YouTube content.
The dataset includes prominent creators effectively.
It enables exploratory analysis of YouTube channel growth mechanisms.
Abstract
YouTube is central to contemporary mass media. However, the official YouTube API does not provide access to the full set of creators or creator metadata on the platform. This lack of basic visibility into the YouTube ecosystem hinders understanding of the platform's creator economy. Researchers currently have no easy, transparent, or replicable way to construct large-scale datasets of YouTube creators and their audiences over time. This makes it challenging to study vital social questions, such as how changes to the YouTube recommendation algorithm shape creator incentives and by extension the mass media on the platform. We address this gap with TubeCensus, a large-scale longitudinal dataset of YouTube creators and subscriber counts, constructed by collecting, linking, and organizing nearly two decades of YouTube page captures from the Internet Archive. This approach is transparent and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
