CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models
Yufei Huang, Deyi Xiong

TL;DR
This paper introduces a comprehensive Chinese Bias Benchmark dataset created through human-AI collaboration, aimed at detecting societal biases in large language models related to Chinese culture, with extensive experiments showing prevalent biases and some models' ability to self-correct.
Contribution
The work presents a novel Chinese bias dataset constructed via a structured human-AI process, enabling effective bias detection and analysis in Chinese large language models.
Findings
All tested models exhibit significant biases in certain categories.
Fine-tuned models can partially avoid morally harmful outputs.
The dataset effectively detects biases in Chinese language models.
Abstract
Holistically measuring societal biases of large language models is crucial for detecting and reducing ethical risks in highly capable AI models. In this work, we present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models, covering stereotypes and societal biases in 14 social dimensions related to Chinese culture and values. The curation process contains 4 essential steps: bias identification via extensive literature review, ambiguous context generation, AI-assisted disambiguous context generation, snd manual review \& recomposition. The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control. The dataset exhibits wide coverage and high diversity. Extensive experiments demonstrate the effectiveness of the dataset in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · Natural Language Processing Techniques
