OffensiveLang: A Community Based Implicit Offensive Language Dataset

Amit Das; Mostafa Rahgouy; Dongji Feng; Zheng Zhang; Tathagata; Bhattacharya; Nilanjana Raychawdhary; Fatemeh Jamshidi; Vinija Jain; Aman; Chadha; Mary Sandage; Lauramarie Pope; Gerry Dozier; Cheryl Seals

arXiv:2403.02472·cs.CL·December 17, 2024·1 cites

OffensiveLang: A Community Based Implicit Offensive Language Dataset

Amit Das, Mostafa Rahgouy, Dongji Feng, Zheng Zhang, Tathagata, Bhattacharya, Nilanjana Raychawdhary, Fatemeh Jamshidi, Vinija Jain, Aman, Chadha, Mary Sandage, Lauramarie Pope, Gerry Dozier, Cheryl Seals

PDF

Open Access 1 Repo 3 Datasets

TL;DR

This paper introduces OffensiveLang, a novel community-based dataset of implicit offensive language generated with ChatGPT, addressing the challenge of detecting subtle hate speech without explicit keywords and incorporating community context.

Contribution

The paper presents a new dataset for implicit offensive language, created using prompt-based ChatGPT generation and human evaluation, filling gaps in existing explicit keyword-based datasets.

Findings

01

ChatGPT effectively generates implicit offensive language data.

02

State-of-the-art models show limited effectiveness in detecting implicit offensive content.

03

Community context enhances understanding of offensive language.

Abstract

The widespread presence of hateful languages on social media has resulted in adverse effects on societal well-being. As a result, addressing this issue with high priority has become very important. Hate speech or offensive languages exist in both explicit and implicit forms, with the latter being more challenging to detect. Current research in this domain encounters several challenges. Firstly, the existing datasets primarily rely on the collection of texts containing explicit offensive keywords, making it challenging to capture implicitly offensive contents that are devoid of these keywords. Secondly, common methodologies tend to focus solely on textual analysis, neglecting the valuable insights that community information can provide. In this research paper, we introduce a novel dataset OffensiveLang, a community based implicit offensive language dataset generated by ChatGPT 3.5…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amitdasrup123/offensivelang
noneOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection

MethodsFocus