Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings
Zhao Liu, Tian Xie, Xueru Zhang

TL;DR
This paper introduces Open-BBQ, an extended bias evaluation dataset for LLMs in open-ended settings, and proposes Composite Prompting to effectively reduce social bias while preserving accuracy.
Contribution
It extends existing bias benchmarks to open-ended responses and develops a new debiasing method combining structured examples with chain-of-thought reasoning.
Findings
Open-BBQ enables bias evaluation in open-ended responses.
Composite Prompting significantly reduces bias in GPT-3.5 and GPT-4o.
The method maintains high accuracy while debiasing.
Abstract
Current social bias benchmarks for Large Language Models (LLMs) primarily rely on predefined question formats like multiple-choice, limiting their ability to reflect the complexity and open-ended nature of real-world interactions. To close this gap, we extend an existing dataset BBQ (Parrish et al., 2022) to Open-BBQ, a comprehensive framework to evaluate the social bias of LLMs in open-ended settings by incorporating two additional question categories: fill-in-the-blank and short-answer. Since our new Open-BBQ dataset contains a lot of open-ended responses like sentences and paragraphs, we developed an evaluation process to detect biases from open-ended content by labeling sentences and paragraphs. In addition to this, we also found that existing debiasing methods, such as self-debiasing (Gallegos et al., 2024), have over-correction issues, which make the original correct answers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Interpreting and Communication in Healthcare · Topic Modeling
