Detecting Offensive Content in Open-domain Conversations using Two Stage   Semi-supervision

Chandra Khatri; Behnam Hedayatnia; Rahul Goel; Anushree Venkatesh,; Raefer Gabriel; Arindam Mandal

arXiv:1811.12900·cs.CL·December 3, 2018·6 cites

Detecting Offensive Content in Open-domain Conversations using Two Stage Semi-supervision

Chandra Khatri, Behnam Hedayatnia, Rahul Goel, Anushree Venkatesh,, Raefer Gabriel, Arindam Mandal

PDF

Open Access

TL;DR

This paper introduces a two-stage semi-supervised method for detecting sensitive content in open-domain conversations, leveraging web data and weak supervision to improve detection accuracy across multiple sensitive categories.

Contribution

The authors propose a novel semi-supervised data collection and training approach that enhances sensitive content detection without extensive manual annotations.

Findings

01

Model trained on semi-supervised data outperforms baselines with 95.5% F1 score.

02

Method generalizes well across multiple sensitive content categories.

03

Large-scale semi-supervision improves out-of-domain detection and recall.

Abstract

As open-ended human-chatbot interaction becomes commonplace, sensitive content detection gains importance. In this work, we propose a two stage semi-supervised approach to bootstrap large-scale data for automatic sensitive language detection from publicly available web resources. We explore various data selection methods including 1) using a blacklist to rank online discussion forums by the level of their sensitiveness followed by randomly sampling utterances and 2) training a weakly supervised model in conjunction with the blacklist for scoring sentences from online discussion forums to curate a dataset. Our data collection strategy is flexible and allows the models to detect implicit sensitive content for which manual annotations may be difficult. We train models using publicly available annotated datasets as well as using the proposed large-scale semi-supervised datasets. We evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Spam and Phishing Detection