WinoQueer: A Community-in-the-Loop Benchmark for Anti-LGBTQ+ Bias in Large Language Models
Virginia K. Felkner, Ho-Chun Herbert Chang, Eugene Jang, Jonathan May

TL;DR
WinoQueer introduces a community-sourced benchmark to measure and mitigate anti-LGBTQ+ bias in large language models, demonstrating that targeted fine-tuning reduces such biases effectively.
Contribution
The paper presents a novel community-in-the-loop method for creating bias benchmarks, specifically for LGBTQ+ issues, and shows how fine-tuning on community-generated data reduces bias in LLMs.
Findings
Off-the-shelf LLMs exhibit significant anti-queer bias.
Fine-tuning on community-written data reduces bias.
Community-driven benchmarks can guide bias mitigation strategies.
Abstract
We present WinoQueer: a benchmark specifically designed to measure whether large language models (LLMs) encode biases that are harmful to the LGBTQ+ community. The benchmark is community-sourced, via application of a novel method that generates a bias benchmark from a community survey. We apply our benchmark to several popular LLMs and find that off-the-shelf models generally do exhibit considerable anti-queer bias. Finally, we show that LLM bias against a marginalized community can be somewhat mitigated by finetuning on data written about or by members of that community, and that social media text written by community members is more effective than news text written about the community by non-members. Our method for community-in-the-loop benchmark development provides a blueprint for future researchers to develop community-driven, harms-grounded LLM benchmarks for other marginalized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
