Entropy-based Classification of 'Retweeting' Activity on Twitter
Rumi Ghosh, Tawan Surachawala, Kristina Lerman

TL;DR
This paper introduces an entropy-based method to classify Twitter retweeting activities, effectively distinguishing between various user behaviors such as spam, news sharing, and promotional campaigns using only two features.
Contribution
It presents a novel, scalable, and robust information-theoretic approach for classifying Twitter retweeting activities based on time-interval and user entropy features.
Findings
Successfully categorized five distinct retweeting activities
Achieved high accuracy in activity separation using minimal features
Demonstrated method's robustness to sampling and missing data
Abstract
Twitter is used for a variety of reasons, including information dissemination, marketing, political organizing and to spread propaganda, spamming, promotion, conversations, and so on. Characterizing these activities and categorizing associated user generated content is a challenging task. We present a information-theoretic approach to classification of user activity on Twitter. We focus on tweets that contain embedded URLs and study their collective `retweeting' dynamics. We identify two features, time-interval and user entropy, which we use to classify retweeting activity. We achieve good separation of different activities using just these two features and are able to categorize content based on the collective user response it generates. We have identified five distinct categories of retweeting activity on Twitter: automatic/robotic activity, newsworthy information dissemination,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Misinformation and Its Impacts · Complex Network Analysis Techniques
