Scalable Privacy-Compliant Virality Prediction on Twitter

Damian Konrad Kowalczyk; Jan Larsen

arXiv:1812.06034·cs.SI·February 23, 2021·1 cites

Scalable Privacy-Compliant Virality Prediction on Twitter

Damian Konrad Kowalczyk, Jan Larsen

PDF

Open Access

TL;DR

This paper introduces a scalable, privacy-compliant framework for predicting Twitter content virality using a novel gradient boosting method that achieves state-of-the-art ranking performance across multiple languages.

Contribution

It presents a new data acquisition and analysis framework that ensures privacy, along with a gradient boosting model that provides explainable, accurate virality predictions on large-scale Twitter data.

Findings

01

Achieved state-of-the-art virality ranking results

02

Model performs well across 18 languages

03

Framework ensures privacy compliance during data processing

Abstract

The digital town hall of Twitter becomes a preferred medium of communication for individuals and organizations across the globe. Some of them reach audiences of millions, while others struggle to get noticed. Given the impact of social media, the question remains more relevant than ever: how to model the dynamics of attention in Twitter. Researchers around the world turn to machine learning to predict the most influential tweets and authors, navigating the volume, velocity, and variety of social big data, with many compromises. In this paper, we revisit content popularity prediction on Twitter. We argue that strict alignment of data acquisition, storage and analysis algorithms is necessary to avoid the common trade-offs between scalability, accuracy and privacy compliance. We propose a new framework for the rapid acquisition of large-scale datasets, high accuracy supervisory signal and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Sentiment Analysis and Opinion Mining