Scalable Privacy-Compliant Virality Prediction on Twitter
Damian Konrad Kowalczyk, Jan Larsen

TL;DR
This paper introduces a scalable, privacy-compliant framework for predicting Twitter content virality using a novel gradient boosting method that achieves state-of-the-art ranking performance across multiple languages.
Contribution
It presents a new data acquisition and analysis framework that ensures privacy, along with a gradient boosting model that provides explainable, accurate virality predictions on large-scale Twitter data.
Findings
Achieved state-of-the-art virality ranking results
Model performs well across 18 languages
Framework ensures privacy compliance during data processing
Abstract
The digital town hall of Twitter becomes a preferred medium of communication for individuals and organizations across the globe. Some of them reach audiences of millions, while others struggle to get noticed. Given the impact of social media, the question remains more relevant than ever: how to model the dynamics of attention in Twitter. Researchers around the world turn to machine learning to predict the most influential tweets and authors, navigating the volume, velocity, and variety of social big data, with many compromises. In this paper, we revisit content popularity prediction on Twitter. We argue that strict alignment of data acquisition, storage and analysis algorithms is necessary to avoid the common trade-offs between scalability, accuracy and privacy compliance. We propose a new framework for the rapid acquisition of large-scale datasets, high accuracy supervisory signal and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Sentiment Analysis and Opinion Mining
