Assessing the Bias in Communication Networks Sampled from Twitter

Sandra Gonz\'alez-Bail\'on; Ning Wang; Alejandro Rivero; Javier; Borge-Holthoefer; Yamir Moreno

arXiv:1212.1684·physics.soc-ph·December 10, 2012

Assessing the Bias in Communication Networks Sampled from Twitter

Sandra Gonz\'alez-Bail\'on, Ning Wang, Alejandro Rivero, Javier, Borge-Holthoefer, Yamir Moreno

PDF

TL;DR

This paper compares two Twitter data sampling methods during a political protest, revealing biases that affect the understanding of communication networks and emphasizing the need for more representative sampling techniques.

Contribution

It provides an empirical analysis of sampling biases in Twitter communication networks, highlighting differences between search and stream APIs during a real-world event.

Findings

01

Search API over-represents central users

02

Bias is greater in mention networks

03

Sampling bias affects diffusion and collective action studies

Abstract

We collect and analyse messages exchanged in Twitter using two of the platform's publicly available APIs (the search and stream specifications). We assess the differences between the two samples, and compare the networks of communication reconstructed from them. The empirical context is given by political protests taking place in May 2012: we track online communication around these protests for the period of one month, and reconstruct the network of mentions and re-tweets according to the two samples. We find that the search API over-represents the more central users and does not offer an accurate picture of peripheral activity; we also find that the bias is greater for the network of mentions. We discuss the implications of this bias for the study of diffusion dynamics and collective action in the digital era, and advocate the need for more uniform sampling procedures in the study of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.