Lexical analysis of automated accounts on Twitter
Isa Inuwa-Dutse, Bello Shehu Bello, Ioannis Korkontzelos

TL;DR
This paper investigates lexical features of tweets to distinguish social bot accounts from human accounts, demonstrating that lexical diversity and emoticon usage are effective indicators for detection.
Contribution
It introduces lexical analysis as a novel feature set for social bot detection, improving classification accuracy over existing methods.
Findings
Lexical diversity and emoticon distribution differ significantly between bots and humans.
Lexical features improve machine learning classification performance.
A new dataset for social bot detection is provided.
Abstract
In recent years, social bots have been using increasingly more sophisticated, challenging detection strategies. While many approaches and features have been proposed, social bots evade detection and interact much like humans making it difficult to distinguish real human accounts from bot accounts. For detection systems, various features under the broader categories of account profile, tweet content, network and temporal pattern have been utilised. The use of tweet content features is limited to analysis of basic terms such as URLs, hashtags, name entities and sentiment. Given a set of tweet contents with no obvious pattern can we distinguish contents produced by social bots from that of humans? We aim to answer this question by analysing the lexical richness of tweets produced by the respective accounts using large collections of different datasets. Our results show a clear margin…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Sentiment Analysis and Opinion Mining · Web Data Mining and Analysis
