TwiBot-20: A Comprehensive Twitter Bot Detection Benchmark

Shangbin Feng; Herun Wan; Ningnan Wang; Jundong Li; Minnan Luo

arXiv:2106.13088·cs.SI·August 30, 2021

TwiBot-20: A Comprehensive Twitter Bot Detection Benchmark

Shangbin Feng, Herun Wan, Ningnan Wang, Jundong Li, Minnan Luo

PDF

3 Repos

TL;DR

TwiBot-20 is the largest and most diverse Twitter bot detection benchmark to date, designed to improve the training and evaluation of bot detection methods with extensive user data.

Contribution

The paper introduces TwiBot-20, a comprehensive, large-scale Twitter bot detection dataset with diverse user information for more effective benchmarking.

Findings

01

Existing methods perform poorly on TwiBot-20

02

TwiBot-20 covers diverse user types and data modalities

03

Benchmarking reveals challenges in current bot detection approaches

Abstract

Twitter has become a vital social media platform while an ample amount of malicious Twitter bots exist and induce undesirable social effects. Successful Twitter bot detection proposals are generally supervised, which rely heavily on large-scale datasets. However, existing benchmarks generally suffer from low levels of user diversity, limited user information and data scarcity. Therefore, these datasets are not sufficient to train and stably benchmark bot detection measures. To alleviate these problems, we present TwiBot-20, a massive Twitter bot detection benchmark, which contains 229,573 users, 33,488,192 tweets, 8,723,736 user property items and 455,958 follow relationships. TwiBot-20 covers diversified bots and genuine users to better represent the real-world Twittersphere. TwiBot-20 also includes three modals of user information to support both binary classification of single users…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.