You Are What You Tweet: Profiling Users by Past Tweets to Improve Hate   Speech Detection

Prateek Chaudhry; Matthew Lease

arXiv:2012.09090·cs.CL·December 14, 2021

You Are What You Tweet: Profiling Users by Past Tweets to Improve Hate Speech Detection

Prateek Chaudhry, Matthew Lease

PDF

Open Access

TL;DR

This paper explores using users' past tweets as context to improve hate speech detection on Twitter, showing promising results but facing challenges due to data limitations and annotation differences.

Contribution

It introduces a method to incorporate user profiling through past tweets into hate speech detection models, enhancing prediction accuracy.

Findings

01

Augmented datasets with user timeline data

02

Improved hate speech detection performance

03

Identified challenges with data sharing and annotation schemes

Abstract

Hate speech detection research has predominantly focused on purely content-based methods, without exploiting any additional context. We briefly critique pros and cons of this task formulation. We then investigate profiling users by their past utterances as an informative prior to better predict whether new utterances constitute hate speech. To evaluate this, we augment three Twitter hate speech datasets with additional timeline data, then embed this additional context into a strong baseline model. Promising results suggest merit for further investigation, though analysis is complicated by differences in annotation schemes and processes, as well as Twitter API limitations and data sharing policies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Spam and Phishing Detection