Debiased Large Language Models Still Associate Muslims with Uniquely   Violent Acts

Babak Hemmatian; Lav R. Varshney

arXiv:2208.04417·cs.CL·August 11, 2022·1 cites

Debiased Large Language Models Still Associate Muslims with Uniquely Violent Acts

Babak Hemmatian, Lav R. Varshney

PDF

Open Access

TL;DR

This study investigates biases in debiased large language models, revealing persistent stereotypes and violent associations with Muslims, and emphasizes the need for further debiasing efforts beyond current methods.

Contribution

It demonstrates that even debiased models still exhibit significant biases and stereotypes, especially when prompted with religion-related names, highlighting limitations of current debiasing techniques.

Findings

01

Debiased models show reduced but persistent biases.

02

Using religion-associated names increases violent completions.

03

Content analysis uncovers offensive, religion-specific violent themes.

Abstract

Recent work demonstrates a bias in the GPT-3 model towards generating violent text completions when prompted about Muslims, compared with Christians and Hindus. Two pre-registered replication attempts, one exact and one approximate, found only the weakest bias in the more recent Instruct Series version of GPT-3, fine-tuned to eliminate biased and toxic outputs. Few violent completions were observed. Additional pre-registered experiments, however, showed that using common names associated with the religions in prompts yields a highly significant increase in violent completions, also revealing a stronger second-order bias against Muslims. Names of Muslim celebrities from non-violent domains resulted in relatively fewer violent completions, suggesting that access to individualized information can steer the model away from using stereotypes. Nonetheless, content analysis revealed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Terrorism, Counterterrorism, and Political Violence

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Linear Warmup With Cosine Annealing · Residual Connection · Attention Dropout · Dense Connections · Dropout · 15 Ways to Contact How can i speak to someone at Delta Airlines