Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings

Zhao Liu; Tian Xie; Xueru Zhang

arXiv:2412.06134·cs.CL·October 16, 2025

Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings

Zhao Liu, Tian Xie, Xueru Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Open-BBQ, an extended bias evaluation dataset for LLMs in open-ended settings, and proposes Composite Prompting to effectively reduce social bias while preserving accuracy.

Contribution

It extends existing bias benchmarks to open-ended responses and develops a new debiasing method combining structured examples with chain-of-thought reasoning.

Findings

01

Open-BBQ enables bias evaluation in open-ended responses.

02

Composite Prompting significantly reduces bias in GPT-3.5 and GPT-4o.

03

The method maintains high accuracy while debiasing.

Abstract

Current social bias benchmarks for Large Language Models (LLMs) primarily rely on predefined question formats like multiple-choice, limiting their ability to reflect the complexity and open-ended nature of real-world interactions. To close this gap, we extend an existing dataset BBQ (Parrish et al., 2022) to Open-BBQ, a comprehensive framework to evaluate the social bias of LLMs in open-ended settings by incorporating two additional question categories: fill-in-the-blank and short-answer. Since our new Open-BBQ dataset contains a lot of open-ended responses like sentences and paragraphs, we developed an evaluation process to detect biases from open-ended content by labeling sentences and paragraphs. In addition to this, we also found that existing debiasing methods, such as self-debiasing (Gallegos et al., 2024), have over-correction issues, which make the original correct answers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhaoliu0914/LLM-Bias-Benchmark
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Interpreting and Communication in Healthcare · Topic Modeling