Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes
Isabel O. Gallegos, Ryan A. Rossi, Joe Barrow, Md Mehrab Tanjim, Tong, Yu, Hanieh Deilamsalehy, Ruiyi Zhang, Sungchul Kim, Franck Dernoncourt

TL;DR
This paper introduces a zero-shot self-debiasing method for large language models that reduces social stereotypes without needing to modify training data or model parameters, using simple prompts and explanations.
Contribution
The work presents a novel zero-shot bias mitigation technique called self-debiasing, which relies solely on the language model and simple prompts, avoiding retraining or fine-tuning.
Findings
Self-debiasing significantly reduces stereotyping across nine social groups.
Explanation-based self-debiasing correctly identifies invalid assumptions.
Reprompting achieves the greatest bias reduction.
Abstract
Large language models (LLMs) have shown remarkable advances in language generation and understanding but are also prone to exhibiting harmful social biases. While recognition of these behaviors has generated an abundance of bias mitigation techniques, most require modifications to the training data, model parameters, or decoding strategy, which may be infeasible without access to a trainable model. In this work, we leverage the zero-shot capabilities of LLMs to reduce stereotyping in a technique we introduce as zero-shot self-debiasing. With two approaches, self-debiasing via explanation and self-debiasing via reprompting, we show that self-debiasing can significantly reduce the degree of stereotyping across nine different social groups while relying only on the LLM itself and a simple prompt, with explanations correctly identifying invalid assumptions and reprompting delivering the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗aieng-lab/bert-base-cased-gradiend-gender-debiasedmodel
- 🤗aieng-lab/bert-large-cased-gradiend-gender-debiasedmodel· 6 dl6 dl
- 🤗aieng-lab/distilbert-base-cased-gradiend-gender-debiasedmodel· 6 dl6 dl
- 🤗aieng-lab/roberta-large-gradiend-gender-debiasedmodel· 4 dl4 dl
- 🤗aieng-lab/gpt2-gradiend-gender-debiasedmodel· 3 dl3 dl
- 🤗aieng-lab/Llama-3.2-3B-gradiend-gender-debiasedmodel· 5 dl5 dl
- 🤗aieng-lab/Llama-3.2-3B-Instruct-gradiend-gender-debiasedmodel· 5 dl5 dl
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
