A Reinforcement Learning-based Offensive semantics Censorship System for Chatbots
Shaokang Cai, Dezhi Han, Zibin Zheng, Dun Li, NoelCrespi

TL;DR
This paper introduces a reinforcement learning-based system for offensive semantics censorship in chatbots, aiming to detect and purify offensive content efficiently while maintaining reply quality.
Contribution
It presents a novel semantics censorship framework combining offensive semantics detection and purification, with an integrated few-shot learning approach for rapid training.
Findings
Reduces offensive reply generation probability in chatbots.
Accelerates semantics purification with a once-through learning method.
Improves training speed while maintaining reply quality.
Abstract
The rapid development of artificial intelligence (AI) technology has enabled large-scale AI applications to land in the market and practice. However, while AI technology has brought many conveniences to people in the productization process, it has also exposed many security issues. Especially, attacks against online learning vulnerabilities of chatbots occur frequently. Therefore, this paper proposes a semantics censorship chatbot system based on reinforcement learning, which is mainly composed of two parts: the Offensive semantics censorship model and the semantics purification model. Offensive semantics review can combine the context of user input sentences to detect the rapid evolution of Offensive semantics and respond to Offensive semantics responses. The semantics purification model For the case of chatting robot models, it has been contaminated by large numbers of offensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Ethics and Social Impacts of AI
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
