Facilitating Pornographic Text Detection for Open-Domain Dialogue   Systems via Knowledge Distillation of Large Language Models

Huachuan Qiu; Shuai Zhang; Hongliang He; Anqi Li; Zhenzhong Lan

arXiv:2403.13250·cs.CL·March 21, 2024·1 cites

Facilitating Pornographic Text Detection for Open-Domain Dialogue Systems via Knowledge Distillation of Large Language Models

Huachuan Qiu, Shuai Zhang, Hongliang He, Anqi Li, Zhenzhong Lan

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces CensorChat, a dataset for detecting pornographic content in dialogue systems, and employs knowledge distillation from large language models to annotate and improve detection accuracy efficiently.

Contribution

It presents a novel dataset and a knowledge distillation framework using large language models for effective pornographic text detection in open-domain dialogues.

Findings

01

Knowledge distillation with LLMs effectively annotates dialogue data.

02

GPT-4 calibration improves label quality.

03

Text classifiers trained on pseudo-labeled data achieve reliable detection.

Abstract

Pornographic content occurring in human-machine interaction dialogues can cause severe side effects for users in open-domain dialogue systems. However, research on detecting pornographic language within human-machine interaction dialogues is an important subject that is rarely studied. To advance in this direction, we introduce CensorChat, a dialogue monitoring dataset aimed at detecting whether the dialogue session contains pornographic content. To this end, we collect real-life human-machine interaction dialogues in the wild and break them down into single utterances and single-turn dialogues, with the last utterance spoken by the chatbot. We propose utilizing knowledge distillation of large language models to annotate the dataset. Specifically, first, the raw dataset is annotated by four open-source large language models, with the majority vote determining the label. Second, we use…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qiuhuachuan/CensorChat
pytorchOfficial

Models

🤗
qiuhuachuan/NSFW-detector
model· 12 dl· ♡ 10
12 dl♡ 10

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Spam and Phishing Detection

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Softmax · Dropout · Byte Pair Encoding · Absolute Position Encodings · Residual Connection · Position-Wise Feed-Forward Layer