Self-Debiasing Large Language Models: Zero-Shot Recognition and   Reduction of Stereotypes

Isabel O. Gallegos; Ryan A. Rossi; Joe Barrow; Md Mehrab Tanjim; Tong; Yu; Hanieh Deilamsalehy; Ruiyi Zhang; Sungchul Kim; Franck Dernoncourt

arXiv:2402.01981·cs.CL·February 6, 2024·2 cites

Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes

Isabel O. Gallegos, Ryan A. Rossi, Joe Barrow, Md Mehrab Tanjim, Tong, Yu, Hanieh Deilamsalehy, Ruiyi Zhang, Sungchul Kim, Franck Dernoncourt

PDF

Open Access 7 Models 1 Video

TL;DR

This paper introduces a zero-shot self-debiasing method for large language models that reduces social stereotypes without needing to modify training data or model parameters, using simple prompts and explanations.

Contribution

The work presents a novel zero-shot bias mitigation technique called self-debiasing, which relies solely on the language model and simple prompts, avoiding retraining or fine-tuning.

Findings

01

Self-debiasing significantly reduces stereotyping across nine social groups.

02

Explanation-based self-debiasing correctly identifies invalid assumptions.

03

Reprompting achieves the greatest bias reduction.

Abstract

Large language models (LLMs) have shown remarkable advances in language generation and understanding but are also prone to exhibiting harmful social biases. While recognition of these behaviors has generated an abundance of bias mitigation techniques, most require modifications to the training data, model parameters, or decoding strategy, which may be infeasible without access to a trainable model. In this work, we leverage the zero-shot capabilities of LLMs to reduce stereotyping in a technique we introduce as zero-shot self-debiasing. With two approaches, self-debiasing via explanation and self-debiasing via reprompting, we show that self-debiasing can significantly reduce the degree of stereotyping across nine different social groups while relying only on the LLM itself and a simple prompt, with explanations correctly identifying invalid assumptions and reprompting delivering the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling