SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with   Customisable Fairness Calibration

Xin Guan; Ze Wang; Nathaniel Demchak; Saloni Gupta; Ediz Ertekin Jr.,; Adriano Koshiyama; Emre Kazim; Zekun Wu

arXiv:2409.11149·cs.CL·January 31, 2025

SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness Calibration

Xin Guan, Ze Wang, Nathaniel Demchak, Saloni Gupta, Ediz Ertekin Jr.,, Adriano Koshiyama, Emre Kazim, Zekun Wu

PDF

Open Access 1 Repo

TL;DR

SAGED is a comprehensive benchmarking pipeline that detects, analyzes, and mitigates biases in large language models through a multi-stage process and new disparity metrics.

Contribution

It introduces a holistic bias benchmarking pipeline with novel metrics and mitigation techniques, addressing limitations of existing bias evaluation methods.

Findings

01

Models show bias against certain countries like Russia and China.

02

Bias varies when models role-play different personas.

03

Qwen2 and Mistral are less responsive to role-playing prompts.

Abstract

The development of unbiased large language models is widely recognized as crucial, yet existing benchmarks fall short in detecting biases due to limited scope, contamination, and lack of a fairness baseline. SAGED(bias) is the first holistic benchmarking pipeline to address these problems. The pipeline encompasses five core stages: scraping materials, assembling benchmarks, generating responses, extracting numeric features, and diagnosing with disparity metrics. SAGED includes metrics for max disparity, such as impact ratio, and bias concentration, such as Max Z-scores. Noticing that metric tool bias and contextual bias in prompts can distort evaluation, SAGED implements counterfactual branching and baseline calibration for mitigation. For demonstration, we use SAGED on G20 Countries with popular 8b-level models including Gemma2, Llama3.1, Mistral, and Qwen2. With sentiment analysis, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

holistic-ai/SAGED-Bias
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques