Information flow reveals prediction limits in online social activity
James P. Bagrow, Xipei Liu, Lewis Mitchell

TL;DR
This paper uses information theory to show that social ties alone can predict individual online activity with up to 95% accuracy, revealing fundamental limits and privacy implications of social data analysis.
Contribution
It demonstrates that social ties contain most predictive information about individuals' online behavior, establishing an upper bound and analyzing information flow in social networks.
Findings
95% of predictive accuracy is achievable from social ties alone
As few as 8-9 contacts suffice for high predictability
Information flow analysis reveals temporal and social effects
Abstract
Modern society depends on the flow of information over online social networks, and users of popular platforms generate significant behavioral data about themselves and their social ties. However, it remains unclear what fundamental limits exist when using these data to predict the activities and interests of individuals, and to what accuracy such predictions can be made using an individual's social ties. Here we show that 95% of the potential predictive accuracy for an individual is achievable using their social ties only, without requiring that individual's data. We use information theoretic tools to estimate the predictive information within the writings of Twitter users, providing an upper bound on the available predictive information that holds for any predictive or machine learning methods. As few as 8-9 of an individual's contacts are sufficient to obtain predictability comparable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
