On YouTube Search API Use in Research
Alexandros Efstratiou

TL;DR
This study investigates the behavior of YouTube's Search API over 12 weeks, revealing high variability and potential biases in returned videos, which impacts research relying on this data source.
Contribution
The paper provides an empirical analysis of YouTube's Search API, highlighting its variability and biases, and offers strategies for researchers to mitigate these issues.
Findings
Search results are highly variable over time.
API tends to favor shorter, popular videos.
Results depend on topic popularity during query periods.
Abstract
YouTube is among the most widely-used platforms worldwide, and has seen a lot of recent academic attention. Despite its popularity and the number of studies conducted on it, much less is understood about the way in which YouTube's Data API, and especially the Search endpoint, operates. In this paper, we analyze the API's behavior by running identical queries across a period of 12 weeks. Our findings show that the search endpoint returns highly variable results between queries. Specifically, the API seems to randomize returned videos based on the relative popularity of the respective topic during the query period, making it nearly impossible to obtain representative historical video samples, especially during non-peak topical periods. Our results also suggest that the API may prioritize shorter, more popular videos, although the role of channel popularity is not as clear. We conclude…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
