You Are What You Tweet: Profiling Users by Past Tweets to Improve Hate Speech Detection
Prateek Chaudhry, Matthew Lease

TL;DR
This paper explores using users' past tweets as context to improve hate speech detection on Twitter, showing promising results but facing challenges due to data limitations and annotation differences.
Contribution
It introduces a method to incorporate user profiling through past tweets into hate speech detection models, enhancing prediction accuracy.
Findings
Augmented datasets with user timeline data
Improved hate speech detection performance
Identified challenges with data sharing and annotation schemes
Abstract
Hate speech detection research has predominantly focused on purely content-based methods, without exploiting any additional context. We briefly critique pros and cons of this task formulation. We then investigate profiling users by their past utterances as an informative prior to better predict whether new utterances constitute hate speech. To evaluate this, we augment three Twitter hate speech datasets with additional timeline data, then embed this additional context into a strong baseline model. Promising results suggest merit for further investigation, though analysis is complicated by differences in annotation schemes and processes, as well as Twitter API limitations and data sharing policies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Spam and Phishing Detection
