Muslim-Violence Bias Persists in Debiased GPT Models
Babak Hemmatian, Razan Baltaji, Lav R. Varshney

TL;DR
This study investigates persistent anti-Muslim bias in GPT models, revealing that despite debiasing efforts, biases remain, especially with certain prompts, highlighting the need for ongoing bias mitigation strategies.
Contribution
The paper demonstrates that current debiasing techniques are insufficient to eliminate anti-Muslim biases in GPT models, especially with religion-specific prompts and higher-order associations.
Findings
Debiasing reduces explicit bias but not higher-order associations.
Using common Muslim names increases violent completions significantly.
Bias effects are stronger in ChatGPT compared to earlier models.
Abstract
Abid et al. (2021) showed a tendency in GPT-3 to generate mostly violent completions when prompted about Muslims, compared with other religions. Two pre-registered replication attempts found few violent completions and only a weak anti-Muslim bias in the more recent InstructGPT, fine-tuned to eliminate biased and toxic outputs. However, more pre-registered experiments showed that using common names associated with the religions in prompts increases several-fold the rate of violent completions, revealing a significant second-order anti-Muslim bias. ChatGPT showed a bias many times stronger regardless of prompt format, suggesting that the effects of debiasing were reduced with continued model development. Our content analysis revealed religion-specific themes containing offensive stereotypes across all experiments. Our results show the need for continual de-biasing of models in ways that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Byte Pair Encoding · Dense Connections · Cosine Annealing · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Weight Decay
