Does Twitter know your political views? POLiTweets dataset and semi-automatic method for political leaning discovery
Joanna Baran, Micha{\l} Kajstura, Maciej Zi\'o{\l}kowski, Krzysztof, Rajda

TL;DR
This paper introduces POLiTweets, a new dataset and a semi-automatic method for predicting political leanings from Twitter data, achieving high accuracy with minimal posts and analyzing domain shifts between different user types.
Contribution
It presents a novel semi-automated annotation method for political leaning detection and introduces the first open Polish dataset for multi-party political affiliation analysis.
Findings
Achieved 0.85 F1-score with only 13 posts per user
Developed a semi-automatic annotation procedure with 0.95 agreement with humans
Analyzed domain shift between ordinary citizens and politicians
Abstract
Every day, the world is flooded by millions of messages and statements posted on Twitter or Facebook. Social media platforms try to protect users' personal data, but there still is a real risk of misuse, including elections manipulation. Did you know, that only 13 posts addressing important or controversial topics for society are enough to predict one's political affiliation with a 0.85 F1-score? To examine this phenomenon, we created a novel universal method of semi-automated political leaning discovery. It relies on a heuristical data annotation procedure, which was evaluated to achieve 0.95 agreement with human annotators (counted as an accuracy metric). We also present POLiTweets - the first publicly open Polish dataset for political affiliation discovery in a multi-party setup, consisting of over 147k tweets from almost 10k Polish-writing users annotated heuristically and almost…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Social Media and Politics · Misinformation and Its Impacts
MethodsTest
