Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted   Language Models

Natalie Mackraz; Nivedha Sivakumar; Samira Khorshidi; Krishna Patel,; Barry-John Theobald; Luca Zappella; Nicholas Apostoloff

arXiv:2412.03537·cs.CL·December 5, 2024

Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models

Natalie Mackraz, Nivedha Sivakumar, Samira Khorshidi, Krishna Patel,, Barry-John Theobald, Luca Zappella, Nicholas Apostoloff

PDF

Open Access

TL;DR

This study investigates how intrinsic biases in large language models transfer to their prompted versions, revealing strong correlations and emphasizing the importance of addressing bias at the pre-training stage for fair downstream applications.

Contribution

It extends bias transfer analysis to causal models with prompt adaptations, demonstrating high correlation of biases between pre-trained and prompted models across various settings.

Findings

01

Bias in pre-trained models strongly correlates with biases in prompted models.

02

Bias transfer remains high even when models are prompted to be fair or biased.

03

Bias correlations are consistent across different prompt lengths and stereotypical content.

Abstract

Large language models (LLMs) are increasingly being adapted to achieve task-specificity for deployment in real-world decision systems. Several previous works have investigated the bias transfer hypothesis (BTH) by studying the effect of the fine-tuning adaptation strategy on model fairness to find that fairness in pre-trained masked language models have limited effect on the fairness of models when adapted using fine-tuning. In this work, we expand the study of BTH to causal models under prompt adaptations, as prompting is an accessible, and compute-efficient way to deploy models in real-world systems. In contrast to previous works, we establish that intrinsic biases in pre-trained Mistral, Falcon and Llama models are strongly correlated (rho >= 0.94) with biases when the same models are zero- and few-shot prompted, using a pronoun co-reference resolution task. Further, we find that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsLLaMA