Sampled Datasets Risk Substantial Bias in the Identification of   Political Polarization on Social Media

Gabriele Di Bona; Emma Fraxanet; Bj\"orn Komander; Andrea Lo Sasso,; Virginia Morini; Antoine Vendeville; Max Falkenberg; Alessandro Galeazzi

arXiv:2406.19867·cs.SI·July 1, 2024·1 cites

Sampled Datasets Risk Substantial Bias in the Identification of Political Polarization on Social Media

Gabriele Di Bona, Emma Fraxanet, Bj\"orn Komander, Andrea Lo Sasso,, Virginia Morini, Antoine Vendeville, Max Falkenberg, Alessandro Galeazzi

PDF

Open Access

TL;DR

This study examines how sampling methods and data restrictions impact the accuracy of measuring political polarization on social media, revealing significant biases in small or poorly selected samples.

Contribution

It demonstrates that small or keyword-poor samples can lead to substantial bias in polarization measurement, emphasizing the need for careful sampling in social media research.

Findings

01

Large samples can be representative of overall polarization

02

Small samples often fail to reflect true polarization

03

Keyword selection critically affects sample bias

Abstract

Following recent policy changes by X (Twitter) and other social media platforms, user interaction data has become increasingly difficult to access. These restrictions are impeding robust research pertaining to social and political phenomena online, which is critical due to the profound impact social media platforms may have on our societies. Here, we investigate the reliability of polarization measures obtained from different samples of social media data by studying the structural polarization of the Polish political debate on Twitter over a 24-hour period. First, we show that the political discussion on Twitter is only a small subset of the wider Twitter discussion. Second, we find that large samples can be representative of the whole political discussion on a platform, but small samples consistently fail to accurately reflect the true structure of polarization online. Finally, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSocial Media and Politics · Hate Speech and Cyberbullying Detection