Electoral Agitation Data Set: The Use Case of the Polish Election
Mateusz Baran, Mateusz W\'ojcik, Piotr Kolebski, Micha{\l} Bernaczyk,, Krzysztof Rajda, {\L}ukasz Augustyniak, Tomasz Kajdanowicz

TL;DR
This paper introduces the first publicly available Polish electoral agitation tweet dataset, annotated for legal categories, and demonstrates its use in fine-tuning a language model to detect electioneering messages.
Contribution
It provides a novel annotated dataset for electoral agitation detection in Polish and applies it to fine-tune a language model, enabling better monitoring of election-related social media content.
Findings
Achieved 0.66 inter-annotator agreement
Fine-tuned HerBERT with 68% F1 score
Analyzed Polish 2020 Presidential Election tweets
Abstract
The popularity of social media makes politicians use it for political advertisement. Therefore, social media is full of electoral agitation (electioneering), especially during the election campaigns. The election administration cannot track the spread and quantity of messages that count as agitation under the election code. It addresses a crucial problem, while also uncovering a niche that has not been effectively targeted so far. Hence, we present the first publicly open data set for detecting electoral agitation in the Polish language. It contains 6,112 human-annotated tweets tagged with four legally conditioned categories. We achieved a 0.66 inter-annotator agreement (Cohen's kappa score). An additional annotator resolved the mismatches between the first two improving the consistency and complexity of the annotation process. The newly created data set was used to fine-tune a Polish…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Social Media and Politics · Sentiment Analysis and Opinion Mining
