Facilitating NSFW Text Detection in Open-Domain Dialogue Systems via Knowledge Distillation
Huachuan Qiu, Shuai Zhang, Hongliang He, Anqi Li, Zhenzhong Lan

TL;DR
This paper introduces CensorChat, a dataset and method leveraging knowledge distillation from GPT-4 and ChatGPT to improve NSFW dialogue detection, emphasizing user safety in open-domain systems.
Contribution
It presents a novel dataset and a cost-effective knowledge distillation approach for NSFW detection in dialogue systems, addressing a gap in sexually explicit content identification.
Findings
BERT classifier achieves improved detection accuracy.
Knowledge distillation reduces labeling costs.
Effective annotation strategy with ChatGPT and GPT-4.
Abstract
NSFW (Not Safe for Work) content, in the context of a dialogue, can have severe side effects on users in open-domain dialogue systems. However, research on detecting NSFW language, especially sexually explicit content, within a dialogue context has significantly lagged behind. To address this issue, we introduce CensorChat, a dialogue monitoring dataset aimed at NSFW dialogue detection. Leveraging knowledge distillation techniques involving GPT-4 and ChatGPT, this dataset offers a cost-effective means of constructing NSFW content detectors. The process entails collecting real-life human-machine interaction data and breaking it down into single utterances and single-turn dialogues, with the chatbot delivering the final utterance. ChatGPT is employed to annotate unlabeled data, serving as a training set. Rationale validation and test sets are constructed using ChatGPT and GPT-4 as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Text Readability and Simplification · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · Linear Warmup With Linear Decay · WordPiece · Softmax · Dense Connections · Attention Dropout · Absolute Position Encodings · BERT
