Panning for gold: Lessons learned from the platform-agnostic automated detection of political content in textual data
Mykola Makhortykh, Ernesto de Le\'on, Aleksandra Urman, Clara, Christner, Maryna Sydorova, Silke Adam, Michaela Maier, and Teresa Gil-Lopez

TL;DR
This paper evaluates automated methods for detecting political content in online texts across platforms, comparing dictionary, machine learning, and neural network techniques using diverse datasets.
Contribution
It systematically compares detection techniques and preprocessing impacts, providing insights into their effectiveness across different data qualities.
Findings
Neural network and machine learning models perform best on less noisy data.
Dictionary-based models are more robust on noisy data.
Preprocessing has limited impact on detection performance.
Abstract
The growing availability of data about online information behaviour enables new possibilities for political communication research. However, the volume and variety of these data makes them difficult to analyse and prompts the need for developing automated content approaches relying on a broad range of natural language processing techniques (e.g. machine learning- or neural network-based ones). In this paper, we discuss how these techniques can be used to detect political content across different platforms. Using three validation datasets, which include a variety of political and non-political textual documents from online platforms, we systematically compare the performance of three groups of detection techniques relying on dictionaries, supervised machine learning, or neural networks. We also examine the impact of different modes of data preprocessing (e.g. stemming and stopword…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Media Influence and Politics · Misinformation and Its Impacts
