On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark

Hao Sun; Guangxuan Xu; Jiawen Deng; Jiale Cheng; Chujie Zheng; Hao; Zhou; Nanyun Peng; Xiaoyan Zhu; Minlie Huang

arXiv:2110.08466·cs.CL·April 5, 2022

On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark

Hao Sun, Guangxuan Xu, Jiawen Deng, Jiale Cheng, Chujie Zheng, Hao, Zhou, Nanyun Peng, Xiaoyan Zhu, Minlie Huang

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces a new taxonomy and dataset for dialogue safety, highlighting the inadequacy of current safety tools and proposing a classifier to detect unsafe behaviors in conversational models.

Contribution

It presents a novel taxonomy for dialogue safety, a new dataset with context-sensitive unsafe examples, and a safety classifier baseline for detecting unsafe behaviors.

Findings

01

Existing safety tools perform poorly on the dataset.

02

The safety classifier effectively detects context-sensitive unsafe behaviors.

03

Current conversational models still exhibit significant safety issues.

Abstract

Dialogue safety problems severely limit the real-world deployment of neural conversational models and have attracted great research interests recently. However, dialogue safety problems remain under-defined and the corresponding dataset is scarce. We propose a taxonomy for dialogue safety specifically designed to capture unsafe behaviors in human-bot dialogue settings, with focuses on context-sensitive unsafety, which is under-explored in prior works. To spur research in this direction, we compile DiaSafety, a dataset with rich context-sensitive unsafe examples. Experiments show that existing safety guarding tools fail severely on our dataset. As a remedy, we train a dialogue safety classifier to provide a strong baseline for context-sensitive dialogue unsafety detection. With our classifier, we perform safety evaluations on popular conversational models and show that existing dialogue…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thu-coai/diasafety
pytorchOfficial

Datasets

thu-coai/diasafety
dataset· 53 dl
53 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Speech and dialogue systems