Sifting Robotic from Organic Text: A Natural Language Approach for Detecting Automation on Twitter
Eric M. Clark, Jake Ryland Williams, Chris A. Jones, Richard A., Galbraith, Christopher M. Danforth, Peter Sheridan Dodds

TL;DR
This paper introduces a natural language processing method to detect automated accounts on Twitter solely based on their text content, offering a flexible tool applicable to various textual datasets.
Contribution
The study presents a novel text-only classification approach for identifying Twitter bots, moving beyond metadata-based detection methods.
Findings
Effective in distinguishing bots from organic users using text analysis
Applicable to other textual data beyond Twitter
Operates independently of account metadata
Abstract
Twitter, a popular social media outlet, has evolved into a vast source of linguistic data, rich with opinion, sentiment, and discussion. Due to the increasing popularity of Twitter, its perceived potential for exerting social influence has led to the rise of a diverse community of automatons, commonly referred to as bots. These inorganic and semi-organic Twitter entities can range from the benevolent (e.g., weather-update bots, help-wanted-alert bots) to the malevolent (e.g., spamming messages, advertisements, or radical opinions). Existing detection algorithms typically leverage meta-data (time between tweets, number of followers, etc.) to identify robotic accounts. Here, we present a powerful classification scheme that exclusively uses the natural language text from organic users to provide a criterion for identifying accounts posting automated messages. Since the classifier operates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
