Forgetful by Design? A Critical Audit of YouTube's Search API for Academic Research
Bernhard Rieder, Adrian Padilla, Oscar Coromina

TL;DR
This paper critically evaluates YouTube's Data API search endpoint, revealing significant limitations in data completeness, consistency, and bias that undermine its suitability for rigorous academic research.
Contribution
It provides a systematic audit of YouTube's search API, highlighting its deficiencies and offering mitigation strategies for researchers.
Findings
Search results vary significantly over time.
Temporal decay reduces retrievable videos within 60 days.
Search results lack consistency, affecting reproducibility.
Abstract
This paper critically audits the search endpoint of YouTube's Data API (v3), a common tool for academic research. Through systematic weekly searches over six months using eleven queries, we identify major limitations regarding completeness, representativeness, consistency, and bias. Our findings reveal substantial differences between ranking parameters like relevance and date in terms of video recall and precision, with relevance often retrieving numerous off-topic videos. We also observe severe temporal decay in video discoverability: the number of retrievable videos for a given period drops dramatically within just 20-60 days of publication, even though these videos remain on the platform. This potentially undermines research designs that rely on systematic data collection. Furthermore, search results lack consistency, with identical queries yielding different video sets over time,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
