Debiased Large Language Models Still Associate Muslims with Uniquely Violent Acts
Babak Hemmatian, Lav R. Varshney

TL;DR
This study investigates biases in debiased large language models, revealing persistent stereotypes and violent associations with Muslims, and emphasizes the need for further debiasing efforts beyond current methods.
Contribution
It demonstrates that even debiased models still exhibit significant biases and stereotypes, especially when prompted with religion-related names, highlighting limitations of current debiasing techniques.
Findings
Debiased models show reduced but persistent biases.
Using religion-associated names increases violent completions.
Content analysis uncovers offensive, religion-specific violent themes.
Abstract
Recent work demonstrates a bias in the GPT-3 model towards generating violent text completions when prompted about Muslims, compared with Christians and Hindus. Two pre-registered replication attempts, one exact and one approximate, found only the weakest bias in the more recent Instruct Series version of GPT-3, fine-tuned to eliminate biased and toxic outputs. Few violent completions were observed. Additional pre-registered experiments, however, showed that using common names associated with the religions in prompts yields a highly significant increase in violent completions, also revealing a stronger second-order bias against Muslims. Names of Muslim celebrities from non-violent domains resulted in relatively fewer violent completions, suggesting that access to individualized information can steer the model away from using stereotypes. Nonetheless, content analysis revealed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Terrorism, Counterterrorism, and Political Violence
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Linear Warmup With Cosine Annealing · Residual Connection · Attention Dropout · Dense Connections · Dropout · 15 Ways to Contact How can i speak to someone at Delta Airlines
