Prediction of Video Popularity in the Absence of Reliable Data from Video Hosting Services: Utility of Traces Left by Users on the Web
Alexey Drutsa (Yandex, Moscow, Russia), Gleb Gusev (Yandex, Moscow,, Russia), Pavel Serdyukov (Yandex, Moscow, Russia)

TL;DR
This paper explores predicting video popularity using web traces like embeds and links, especially when direct data from hosting services is unavailable, improving prediction accuracy for content aggregators.
Contribution
It introduces a novel approach to predict video popularity from web traces and internal logs, reducing reliance on direct hosting service data.
Findings
Web traces significantly improve prediction accuracy.
Embedding and link data can replace API data when unavailable.
Prediction models perform well with combined web and internal data.
Abstract
With the growth of user-generated content, we observe the constant rise of the number of companies, such as search engines, content aggregators, etc., that operate with tremendous amounts of web content not being the services hosting it. Thus, aiming to locate the most important content and promote it to the users, they face the need of estimating the current and predicting the future content popularity. In this paper, we approach the problem of video popularity prediction not from the side of a video hosting service, as done in all previous studies, but from the side of an operating company, which provides a popular video search service that aggregates content from different video hosting websites. We investigate video popularity prediction based on features from three primary sources available for a typical operating company: first, the content hosting provider may deliver its data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Complex Network Analysis Techniques · Caching and Content Delivery
