Lexical analysis of automated accounts on Twitter

Isa Inuwa-Dutse; Bello Shehu Bello; Ioannis Korkontzelos

arXiv:1812.07947·cs.SI·December 20, 2018·5 cites

Lexical analysis of automated accounts on Twitter

Isa Inuwa-Dutse, Bello Shehu Bello, Ioannis Korkontzelos

PDF

Open Access

TL;DR

This paper investigates lexical features of tweets to distinguish social bot accounts from human accounts, demonstrating that lexical diversity and emoticon usage are effective indicators for detection.

Contribution

It introduces lexical analysis as a novel feature set for social bot detection, improving classification accuracy over existing methods.

Findings

01

Lexical diversity and emoticon distribution differ significantly between bots and humans.

02

Lexical features improve machine learning classification performance.

03

A new dataset for social bot detection is provided.

Abstract

In recent years, social bots have been using increasingly more sophisticated, challenging detection strategies. While many approaches and features have been proposed, social bots evade detection and interact much like humans making it difficult to distinguish real human accounts from bot accounts. For detection systems, various features under the broader categories of account profile, tweet content, network and temporal pattern have been utilised. The use of tweet content features is limited to analysis of basic terms such as URLs, hashtags, name entities and sentiment. Given a set of tweet contents with no obvious pattern can we distinguish contents produced by social bots from that of humans? We aim to answer this question by analysing the lexical richness of tweets produced by the respective accounts using large collections of different datasets. Our results show a clear margin…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Sentiment Analysis and Opinion Mining · Web Data Mining and Analysis