Breaking Bias, Building Bridges: Evaluation and Mitigation of Social   Biases in LLMs via Contact Hypothesis

Chahat Raj; Anjishnu Mukherjee; Aylin Caliskan; Antonios; Anastasopoulos; Ziwei Zhu

arXiv:2407.02030·cs.CL·July 3, 2024·2 cites

Breaking Bias, Building Bridges: Evaluation and Mitigation of Social Biases in LLMs via Contact Hypothesis

Chahat Raj, Anjishnu Mukherjee, Aylin Caliskan, Antonios, Anastasopoulos, Ziwei Zhu

PDF

Open Access

TL;DR

This paper investigates social biases in large language models and introduces a novel debiasing method inspired by social psychology's Contact Hypothesis, demonstrating significant bias reduction through targeted instruction tuning.

Contribution

It applies the Contact Hypothesis to LLM debiasing, creating a new dataset and a unique instruction-tuning technique that effectively reduces social biases.

Findings

01

Biases can be reduced by up to 40% with one epoch of tuning.

02

Simulated social contact influences bias levels in LLMs.

03

The proposed method outperforms baseline approaches.

Abstract

Large Language Models (LLMs) perpetuate social biases, reflecting prejudices in their training data and reinforcing societal stereotypes and inequalities. Our work explores the potential of the Contact Hypothesis, a concept from social psychology for debiasing LLMs. We simulate various forms of social contact through LLM prompting to measure their influence on the model's biases, mirroring how intergroup interactions can reduce prejudices in social contexts. We create a dataset of 108,000 prompts following a principled approach replicating social contact to measure biases in three LLMs (LLaMA 2, Tulu, and NousHermes) across 13 social bias dimensions. We propose a unique debiasing technique, Social Contact Debiasing (SCD), that instruction-tunes these models with unbiased responses to prompts. Our research demonstrates that LLM responses exhibit social biases when subject to contact…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInnovation and Knowledge Management

MethodsLLaMA