Detecting Offensive Content in Open-domain Conversations using Two Stage Semi-supervision
Chandra Khatri, Behnam Hedayatnia, Rahul Goel, Anushree Venkatesh,, Raefer Gabriel, Arindam Mandal

TL;DR
This paper introduces a two-stage semi-supervised method for detecting sensitive content in open-domain conversations, leveraging web data and weak supervision to improve detection accuracy across multiple sensitive categories.
Contribution
The authors propose a novel semi-supervised data collection and training approach that enhances sensitive content detection without extensive manual annotations.
Findings
Model trained on semi-supervised data outperforms baselines with 95.5% F1 score.
Method generalizes well across multiple sensitive content categories.
Large-scale semi-supervision improves out-of-domain detection and recall.
Abstract
As open-ended human-chatbot interaction becomes commonplace, sensitive content detection gains importance. In this work, we propose a two stage semi-supervised approach to bootstrap large-scale data for automatic sensitive language detection from publicly available web resources. We explore various data selection methods including 1) using a blacklist to rank online discussion forums by the level of their sensitiveness followed by randomly sampling utterances and 2) training a weakly supervised model in conjunction with the blacklist for scoring sentences from online discussion forums to curate a dataset. Our data collection strategy is flexible and allows the models to detect implicit sensitive content for which manual annotations may be difficult. We train models using publicly available annotated datasets as well as using the proposed large-scale semi-supervised datasets. We evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Spam and Phishing Detection
