On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark
Hao Sun, Guangxuan Xu, Jiawen Deng, Jiale Cheng, Chujie Zheng, Hao, Zhou, Nanyun Peng, Xiaoyan Zhu, Minlie Huang

TL;DR
This paper introduces a new taxonomy and dataset for dialogue safety, highlighting the inadequacy of current safety tools and proposing a classifier to detect unsafe behaviors in conversational models.
Contribution
It presents a novel taxonomy for dialogue safety, a new dataset with context-sensitive unsafe examples, and a safety classifier baseline for detecting unsafe behaviors.
Findings
Existing safety tools perform poorly on the dataset.
The safety classifier effectively detects context-sensitive unsafe behaviors.
Current conversational models still exhibit significant safety issues.
Abstract
Dialogue safety problems severely limit the real-world deployment of neural conversational models and have attracted great research interests recently. However, dialogue safety problems remain under-defined and the corresponding dataset is scarce. We propose a taxonomy for dialogue safety specifically designed to capture unsafe behaviors in human-bot dialogue settings, with focuses on context-sensitive unsafety, which is under-explored in prior works. To spur research in this direction, we compile DiaSafety, a dataset with rich context-sensitive unsafe examples. Experiments show that existing safety guarding tools fail severely on our dataset. As a remedy, we train a dialogue safety classifier to provide a strong baseline for context-sensitive dialogue unsafety detection. With our classifier, we perform safety evaluations on popular conversational models and show that existing dialogue…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Speech and dialogue systems
