Writer Identification Using Microblogging Texts for Social Media Forensics
Fernando Alonso-Fernandez, Nicole Mariah Sharon Belvisi, Kevin, Hernandez-Diaz, Naveed Muhammad, Josef Bigun

TL;DR
This paper investigates authorship identification of Twitter messages using stylometric and platform-specific features, demonstrating high accuracy with sufficient training data and offering insights into feature effectiveness and computational aspects.
Contribution
It introduces a comprehensive evaluation of stylometric and Twitter-specific features for authorship attribution on short texts, with automatic feature selection and analysis of performance across different data sizes.
Findings
High accuracy (>80% Rank-5) with over 500 training Tweets and few test Tweets.
Reduced candidate search space by 9-15% with small training samples.
Verification error rate below 15% with hundreds of training Tweets.
Abstract
Establishing authorship of online texts is fundamental to combat cybercrimes. Unfortunately, text length is limited on some platforms, making the challenge harder. We aim at identifying the authorship of Twitter messages limited to 140 characters. We evaluate popular stylometric features, widely used in literary analysis, and specific Twitter features like URLs, hashtags, replies or quotes. We use two databases with 93 and 3957 authors, respectively. We test varying sized author sets and varying amounts of training/test texts per author. Performance is further improved by feature combination via automatic selection. With a large number of training Tweets (>500), a good accuracy (Rank-5>80%) is achievable with only a few dozens of test Tweets, even with several thousands of authors. With smaller sample sizes (10-20 training Tweets), the search space can be diminished by 9-15% while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
