A Quality Type-aware Annotated Corpus and Lexicon for Harassment   Research

Mohammadreza Rezvan; Saeedeh Shekarpour; Lakshika Balasuriya,; Krishnaprasad Thirunarayan; Valerie Shalin; Amit Sheth

arXiv:1802.09416·cs.CL·May 25, 2018

A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research

Mohammadreza Rezvan, Saeedeh Shekarpour, Lakshika Balasuriya,, Krishnaprasad Thirunarayan, Valerie Shalin, Amit Sheth

PDF

TL;DR

This paper introduces a high-quality, annotated Twitter corpus and lexicon for five types of harassment, providing a valuable resource for cyberbullying research and standard benchmarks.

Contribution

It presents the first annotated corpus and lexicon for multiple harassment types, enabling more accurate detection and analysis of cyberbullying behaviors.

Findings

01

25,000 annotated tweets across five harassment types

02

A new lexicon of offensive words for harassment detection

03

Resource shared publicly for research community

Abstract

Having a quality annotated corpus is essential especially for applied research. Despite the recent focus of Web science community on researching about cyberbullying, the community dose not still have standard benchmarks. In this paper, we publish first, a quality annotated corpus and second, an offensive words lexicon capturing different types type of harassment as (i) sexual harassment, (ii) racial harassment, (iii) appearance-related harassment, (iv) intellectual harassment, and (v) political harassment.We crawled data from Twitter using our offensive lexicon. Then relied on the human judge to annotate the collected tweets w.r.t. the contextual types because using offensive words is not sufficient to reliably detect harassment. Our corpus consists of 25,000 annotated tweets in five contextual types. We are pleased to share this novel annotated corpus and the lexicon with the research…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.