BotArtist: Generic approach for bot detection in Twitter via semi-automatic machine learning pipeline
Alexander Shevtsov, Despoina Antonakaki, Ioannis Lamprou, Polyvios, Pratikakis, Sotiris Ioannidis

TL;DR
This paper presents BotArtist, a new semi-automatic machine learning pipeline for Twitter bot detection that outperforms existing methods by nearly 10% in F1-score and provides a large labeled dataset for future research.
Contribution
Introduction of BotArtist, a novel bot detection model based on user profile features, and the creation of one of the largest labeled Twitter bot datasets for research.
Findings
BotArtist outperforms state-of-the-art methods by nearly 10% in F1-score.
The dataset includes over 10 million Twitter profiles with bot/human labels.
Evaluation conducted across nine public datasets under standardized conditions.
Abstract
Twitter, as one of the most popular social networks, provides a platform for communication and online discourse. Unfortunately, it has also become a target for bots and fake accounts, resulting in the spread of false information and manipulation. This paper introduces a semi-automatic machine learning pipeline (SAMLP) designed to address the challenges associated with machine learning model development. Through this pipeline, we develop a comprehensive bot detection model named BotArtist, based on user profile features. SAMLP leverages nine distinct publicly available datasets to train the BotArtist model. To assess BotArtist's performance against current state-of-the-art solutions, we evaluate 35 existing Twitter bot detection methods, each utilizing a diverse range of features. Our comparative evaluation of BotArtist and these existing methods, conducted across nine public datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Hate Speech and Cyberbullying Detection
