CALM : A Multi-task Benchmark for Comprehensive Assessment of Language   Model Bias

Vipul Gupta; Pranav Narayanan Venkit; Hugo Lauren\c{c}on; Shomir; Wilson; Rebecca J. Passonneau

arXiv:2308.12539·cs.CL·August 9, 2024

CALM : A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias

Vipul Gupta, Pranav Narayanan Venkit, Hugo Lauren\c{c}on, Shomir, Wilson, Rebecca J. Passonneau

PDF

Open Access 1 Repo 1 Datasets

TL;DR

CALM introduces a robust, multi-task benchmark for assessing gender and race bias in language models, addressing limitations of previous measures by using diverse templates and multiple NLP tasks.

Contribution

This work presents CALM, a comprehensive, multi-task benchmark with diverse templates for more reliable bias measurement across language models.

Findings

01

CALM bias scores are more robust and less sensitive to template perturbations.

02

Larger models tend to exhibit more bias than smaller ones.

03

T0 models are among the least biased of the evaluated language models.

Abstract

As language models (LMs) become increasingly powerful and widely used, it is important to quantify them for sociodemographic bias with potential for harm. Prior measures of bias are sensitive to perturbations in the templates designed to compare performance across social groups, due to factors such as low diversity or limited number of templates. Also, most previous work considers only one NLP task. We introduce Comprehensive Assessment of Language Models (CALM) for robust measurement of two types of universally relevant sociodemographic bias, gender and race. CALM integrates sixteen datasets for question-answering, sentiment analysis and natural language inference. Examples from each dataset are filtered to produce 224 templates with high diversity (e.g., length, vocabulary). We assemble 50 highly frequent person names for each of seven distinct demographic groups to generate 78,400…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vipulgupta1011/calm
noneOfficial

Datasets

vipulgupta/CALM
dataset· 16 dl
16 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Natural Language Processing Techniques

MethodsOPT