Small data problems in political research: a critical replication study
Hugo de Vos, Suzan Verberne

TL;DR
This study critically replicates a 2019 paper on machine learning for political tweets, revealing that small datasets lead to model instability and preprocessing issues, challenging prior conclusions and emphasizing the need for larger data and validation.
Contribution
The paper demonstrates the impact of small data on model stability and preprocessing effects in political text classification, highlighting limitations of previous findings.
Findings
Small data causes high sensitivity to train-test splits.
Preprocessing leads to extremely sparse data with few lexical features.
Small data issues persist regardless of preprocessing choices.
Abstract
In an often-cited 2019 paper on the use of machine learning in political research, Anastasopoulos & Whitford (A&W) propose a text classification method for tweets related to organizational reputation. The aim of their paper was to provide a 'guide to practice' for public administration scholars and practitioners on the use of machine learning. In the current paper we follow up on that work with a replication of A&W's experiments and additional analyses on model stability and the effects of preprocessing, both in relation to the small data size. We show that (1) the small data causes the classification model to be highly sensitive to variations in the random train-test split, and that (2) the applied preprocessing causes the data to be extremely sparse, with the majority of items in the data having at most two non-zero lexical features. With additional experiments in which we vary the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPolitical Influence and Corporate Strategies · Social Media and Politics · Public Policy and Administration Research
