Sampled Datasets Risk Substantial Bias in the Identification of Political Polarization on Social Media
Gabriele Di Bona, Emma Fraxanet, Bj\"orn Komander, Andrea Lo Sasso,, Virginia Morini, Antoine Vendeville, Max Falkenberg, Alessandro Galeazzi

TL;DR
This study examines how sampling methods and data restrictions impact the accuracy of measuring political polarization on social media, revealing significant biases in small or poorly selected samples.
Contribution
It demonstrates that small or keyword-poor samples can lead to substantial bias in polarization measurement, emphasizing the need for careful sampling in social media research.
Findings
Large samples can be representative of overall polarization
Small samples often fail to reflect true polarization
Keyword selection critically affects sample bias
Abstract
Following recent policy changes by X (Twitter) and other social media platforms, user interaction data has become increasingly difficult to access. These restrictions are impeding robust research pertaining to social and political phenomena online, which is critical due to the profound impact social media platforms may have on our societies. Here, we investigate the reliability of polarization measures obtained from different samples of social media data by studying the structural polarization of the Polish political debate on Twitter over a 24-hour period. First, we show that the political discussion on Twitter is only a small subset of the wider Twitter discussion. Second, we find that large samples can be representative of the whole political discussion on a platform, but small samples consistently fail to accurately reflect the true structure of polarization online. Finally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Media and Politics · Hate Speech and Cyberbullying Detection
