CEB: Compositional Evaluation Benchmark for Fairness in Large Language   Models

Song Wang; Peng Wang; Tong Zhou; Yushun Dong; Zhen Tan; Jundong Li

arXiv:2407.02408·cs.CL·February 25, 2025

CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models

Song Wang, Peng Wang, Tong Zhou, Yushun Dong, Zhen Tan, Jundong Li

PDF

Open Access 1 Datasets

TL;DR

This paper introduces CEB, a comprehensive benchmark for evaluating various biases in large language models across multiple social groups and tasks, addressing inconsistencies in existing bias evaluation methods.

Contribution

The paper proposes a new compositional taxonomy and a unified evaluation benchmark, CEB, to systematically assess bias types in LLMs across social groups and NLP tasks.

Findings

01

Bias levels vary across different social groups and tasks.

02

CEB provides a standardized framework for bias evaluation.

03

Guides development of targeted bias mitigation strategies.

Abstract

As Large Language Models (LLMs) are increasingly deployed to handle various natural language processing (NLP) tasks, concerns regarding the potential negative societal impacts of LLM-generated content have also arisen. To evaluate the biases exhibited by LLMs, researchers have recently proposed a variety of datasets. However, existing bias evaluation efforts often focus on only a particular type of bias and employ inconsistent evaluation metrics, leading to difficulties in comparison across different datasets and LLMs. To address these limitations, we collect a variety of datasets designed for the bias evaluation of LLMs, and further propose CEB, a Compositional Evaluation Benchmark that covers different types of bias across different social groups and tasks. The curation of CEB is based on our newly proposed compositional taxonomy, which characterizes each dataset from three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

stan-hua/ceb
dataset· 202 dl
202 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI

MethodsFocus