TL;DR
This paper develops supervised learning models to detect organized behavior in tweets, achieving over 95% accuracy, and identifies key features like user activity and media use that distinguish coordinated campaigns during the 2016 US elections.
Contribution
It introduces a classification approach using machine learning to identify organized versus organic and political versus non-political tweet sets, with a focus on feature importance.
Findings
Random Forest achieved >95% accuracy
User-based features are most influential
Media use and favorites are key indicators
Abstract
During the 2016 US elections Twitter experienced unprecedented levels of propaganda and fake news through the collaboration of bots and hired persons, the ramifications of which are still being debated. This work proposes an approach to identify the presence of organized behavior in tweets. The Random Forest, Support Vector Machine, and Logistic Regression algorithms are each used to train a model with a data set of 850 records consisting of 299 features extracted from tweets gathered during the 2016 US presidential election. The features represent user and temporal synchronization characteristics to capture coordinated behavior. These models are trained to classify tweet sets among the categories: organic vs organized, political vs non-political, and pro-Trump vs pro-Hillary vs neither. The random forest algorithm performs better with greater than 95% average accuracy and f-measure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
