The Trumpiest Trump? Identifying a Subject's Most Characteristic Tweets
Charuta Pethe, Steven Skiena

TL;DR
This paper develops models to identify and quantify how representative a tweet is of a celebrity's typical style, linking this to tweet popularity and enabling better content summarization.
Contribution
It introduces a novel approach for measuring tweet representativeness and demonstrates its correlation with popularity, validated through human evaluation and multiple classification methods.
Findings
High accuracy (90.37%) in author detection.
Human evaluators agree with the model's characterization scores.
Significant correlation between representativeness scores and tweet popularity.
Abstract
The sequence of documents produced by any given author varies in style and content, but some documents are more typical or representative of the source than others. We quantify the extent to which a given short text is characteristic of a specific person, using a dataset of tweets from fifteen celebrities. Such analysis is useful for generating excerpts of high-volume Twitter profiles, and understanding how representativeness relates to tweet popularity. We first consider the related task of binary author detection (is x the author of text T?), and report a test accuracy of 90.37% for the best of five approaches to this problem. We then use these models to compute characterization scores among all of an author's texts. A user study shows human evaluators agree with our characterization model for all 15 celebrities in our dataset, each with p-value < 0.05. We use these classifiers to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Topic Modeling · Hate Speech and Cyberbullying Detection
