WinoQueer: A Community-in-the-Loop Benchmark for Anti-LGBTQ+ Bias in   Large Language Models

Virginia K. Felkner; Ho-Chun Herbert Chang; Eugene Jang; Jonathan May

arXiv:2306.15087·cs.CL·October 21, 2024·6 cites

WinoQueer: A Community-in-the-Loop Benchmark for Anti-LGBTQ+ Bias in Large Language Models

Virginia K. Felkner, Ho-Chun Herbert Chang, Eugene Jang, Jonathan May

PDF

Open Access 1 Repo

TL;DR

WinoQueer introduces a community-sourced benchmark to measure and mitigate anti-LGBTQ+ bias in large language models, demonstrating that targeted fine-tuning reduces such biases effectively.

Contribution

The paper presents a novel community-in-the-loop method for creating bias benchmarks, specifically for LGBTQ+ issues, and shows how fine-tuning on community-generated data reduces bias in LLMs.

Findings

01

Off-the-shelf LLMs exhibit significant anti-queer bias.

02

Fine-tuning on community-written data reduces bias.

03

Community-driven benchmarks can guide bias mitigation strategies.

Abstract

We present WinoQueer: a benchmark specifically designed to measure whether large language models (LLMs) encode biases that are harmful to the LGBTQ+ community. The benchmark is community-sourced, via application of a novel method that generates a bias benchmark from a community survey. We apply our benchmark to several popular LLMs and find that off-the-shelf models generally do exhibit considerable anti-queer bias. Finally, we show that LLM bias against a marginalized community can be somewhat mitigated by finetuning on data written about or by members of that community, and that social media text written by community members is more effective than news text written about the community by non-members. Our method for community-in-the-loop benchmark development provides a blueprint for future researchers to develop community-driven, harms-grounded LLM benchmarks for other marginalized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

katyfelkner/winoqueer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection