Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks
Jingyan Zhou, Jiawen Deng, Fei Mi, Yitong Li, Yasheng Wang, Minlie, Huang, Xin Jiang, Qun Liu, Helen Meng

TL;DR
This paper introduces a new framework and dataset for detecting social bias in Chinese dialog systems, providing benchmarks to improve safety and reduce biases in conversational AI.
Contribution
It proposes the Dial-Bias Frame for analyzing social bias, creates the first annotated Chinese bias dialog dataset, and establishes benchmarks for bias detection at multiple levels.
Findings
The Dial-Bias Frame enables comprehensive bias analysis.
The CDail-Bias Dataset is the first annotated Chinese social bias dialog dataset.
Benchmarks show the importance of detailed analysis for bias detection.
Abstract
The research of open-domain dialog systems has been greatly prospered by neural models trained on large-scale corpora, however, such corpora often introduce various safety problems (e.g., offensive languages, biases, and toxic behaviors) that significantly hinder the deployment of dialog systems in practice. Among all these unsafe issues, addressing social bias is more complex as its negative impact on marginalized populations is usually expressed implicitly, thus requiring normative reasoning and rigorous analysis. In this paper, we focus our investigation on social bias detection of dialog safety problems. We first propose a novel Dial-Bias Frame for analyzing the social bias in conversations pragmatically, which considers more comprehensive bias-related analyses rather than simple dichotomy annotations. Based on the proposed framework, we further introduce CDail-Bias Dataset that, to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection
